drawspans optimizations in c?
Moderator: InsideQC Admins
20 posts
• Page 2 of 2 • 1, 2
I'm going to implement both of these optimizations in Makaqu, thanks qbism and mh.
Hmm, the 16-pixel version is actually the default one, according to the default value of the d_subdiv16 cvar, so qbism's optimization will also make the renderer output more uniform across different ports.
Dunno, but this may shed a bit of light:
Hmm, the 16-pixel version is actually the default one, according to the default value of the d_subdiv16 cvar, so qbism's optimization will also make the renderer output more uniform across different ports.
qbism wrote:17fps vs 12.8fps: That's a lot of FPS still on the table. I wonder if any typical compiler settings could be screwing up the unroll benefit? Possible that unoptimized could be faster in this loop. Or maybe related to 486 instruction set?
Dunno, but this may shed a bit of light:
- The original Q2 has optimized x86 assembly rasterizers, these were one of the fastest of their time, and they played with cunning tricks such as explicitely paralellizing x86 and x87 instructions to achieve maximum speed (for example, the division for perspective correction for the next 8 pixel span was performed in parallel with the actual rendering of the current 8 pixel span, ie, the perspective correction was almost 'free'). The C rasterizers that this version uses don't have this property (and today's compilers still don't figure it out).
-

mankrip - Posts: 915
- Joined: Fri Jul 04, 2008 3:02 am
MK - glad to have something that contributes back to your wonderful engine.
leileilol- Seems ID may have had the opportunity to apply this to Q1, wonder if they did. The last Q1 update was '99 (?) after Q2 was released.
Could the C version be paralleled across multiple cores if available?
leileilol- Seems ID may have had the opportunity to apply this to Q1, wonder if they did. The last Q1 update was '99 (?) after Q2 was released.
Could the C version be paralleled across multiple cores if available?
-
qbism - Posts: 1236
- Joined: Thu Nov 04, 2004 5:51 am
qbism wrote:Could the C version be paralleled across multiple cores if available?
I was thinking the same myself earlier on, and it may be the most efficient way of doing it, but you'd need to avoid thread-synchronisation overhead by e.g. partitioning the spans list between threads. Software Quake would be a little scary to do as it makes very heavy use of globals, but so long as they're just read from and not written to, it should be relatively safe.
I did port one asm function from Q2 to Q1, but that was just BoxOnPlaneSide; it worked perfectly though, leading me to suspect that (at least some of) the Q2 asm is little more than converted Q1 asm with some minor modifications.
We had the power, we had the space, we had a sense of time and place
We knew the words, we knew the score, we knew what we were fighting for
We knew the words, we knew the score, we knew what we were fighting for
-

mh - Posts: 2292
- Joined: Sat Jan 12, 2008 1:38 am
Perhaps the entire function could be sent to another core. It's using roughly 50% of CPU time in singleplayer, depending on resolution.
Moot point in Flash or 486 or a single-core Atom book. Although it would be great if a browser could run each Flash (or Silverlight or html5) instance in a separate thread.
I intend to re-create the previous OQ+ botmatch flash demo to see how the new span function runs by comparison!
Moot point in Flash or 486 or a single-core Atom book. Although it would be great if a browser could run each Flash (or Silverlight or html5) instance in a separate thread.
I intend to re-create the previous OQ+ botmatch flash demo to see how the new span function runs by comparison!
-
qbism - Posts: 1236
- Joined: Thu Nov 04, 2004 5:51 am
20 posts
• Page 2 of 2 • 1, 2
Who is online
Users browsing this forum: No registered users and 1 guest