Forum

drawspans optimizations in c?

Discuss programming topics for the various GPL'd game engine sources.

Moderator: InsideQC Admins

Postby mankrip » Wed Nov 10, 2010 5:52 am

I'm going to implement both of these optimizations in Makaqu, thanks qbism and mh.

Hmm, the 16-pixel version is actually the default one, according to the default value of the d_subdiv16 cvar, so qbism's optimization will also make the renderer output more uniform across different ports.
qbism wrote:17fps vs 12.8fps: That's a lot of FPS still on the table. I wonder if any typical compiler settings could be screwing up the unroll benefit? Possible that unoptimized could be faster in this loop. Or maybe related to 486 instruction set?

Dunno, but this may shed a bit of light:
- The original Q2 has optimized x86 assembly rasterizers, these were one of the fastest of their time, and they played with cunning tricks such as explicitely paralellizing x86 and x87 instructions to achieve maximum speed (for example, the division for perspective correction for the next 8 pixel span was performed in parallel with the actual rendering of the current 8 pixel span, ie, the perspective correction was almost 'free'). The C rasterizers that this version uses don't have this property (and today's compilers still don't figure it out).
Ph'nglui mglw'nafh mankrip Hell's end wgah'nagl fhtagn.
==-=-=-=-=-=-=-=-=-=-==
Dev blog / Twitter / YouTube
User avatar
mankrip
 
Posts: 915
Joined: Fri Jul 04, 2008 3:02 am

Postby leileilol » Wed Nov 10, 2010 6:04 am

I dunno man I did the comparison using the same compiler id used for Quake - djgpp lol

No one has dared to yet merge in the q2 asm in Quake yet to any non-feature-creep result have they!?
i should not be here
leileilol
 
Posts: 2783
Joined: Fri Oct 15, 2004 3:23 am

Postby qbism » Wed Nov 10, 2010 6:09 pm

MK - glad to have something that contributes back to your wonderful engine.

leileilol- Seems ID may have had the opportunity to apply this to Q1, wonder if they did. The last Q1 update was '99 (?) after Q2 was released.

Could the C version be paralleled across multiple cores if available?
User avatar
qbism
 
Posts: 1236
Joined: Thu Nov 04, 2004 5:51 am

Postby mh » Wed Nov 10, 2010 6:30 pm

qbism wrote:Could the C version be paralleled across multiple cores if available?

I was thinking the same myself earlier on, and it may be the most efficient way of doing it, but you'd need to avoid thread-synchronisation overhead by e.g. partitioning the spans list between threads. Software Quake would be a little scary to do as it makes very heavy use of globals, but so long as they're just read from and not written to, it should be relatively safe.

I did port one asm function from Q2 to Q1, but that was just BoxOnPlaneSide; it worked perfectly though, leading me to suspect that (at least some of) the Q2 asm is little more than converted Q1 asm with some minor modifications.
We had the power, we had the space, we had a sense of time and place
We knew the words, we knew the score, we knew what we were fighting for
User avatar
mh
 
Posts: 2292
Joined: Sat Jan 12, 2008 1:38 am

Postby qbism » Sat Nov 13, 2010 12:50 am

Perhaps the entire function could be sent to another core. It's using roughly 50% of CPU time in singleplayer, depending on resolution.

Moot point in Flash or 486 or a single-core Atom book. Although it would be great if a browser could run each Flash (or Silverlight or html5) instance in a separate thread.

I intend to re-create the previous OQ+ botmatch flash demo to see how the new span function runs by comparison!
User avatar
qbism
 
Posts: 1236
Joined: Thu Nov 04, 2004 5:51 am

Previous

Return to Engine Programming

Who is online

Users browsing this forum: No registered users and 1 guest