Subdiv16 for sprites?
Re: Subdiv16 for sprites?
Indeed, and it helped me trim 62,344 bytes from r_surf.c
now I feel like making entire functions out of macros!
now I feel like making entire functions out of macros!
i should not be here
Re: Subdiv16 for sprites?
qbism, grab my D_SpriteDrawSpans16_ColorKeyed function, paste the code of the if (psprite->type == SPR_VP_PARALLEL) path in one file, the code of the else path in another file, and run a diff tool to compare both files, so you can see exactly what was changed.
Your IZI macro is also pointless. It's only useful in my code because I redefine it before calling the macros used for drawing the pixels. I did it that way so the same pixel drawing macros could be used for both SPR_VP_PARALLEL and else paths. But you are using different pixel drawing macros for each path.
For grenade explosions it would probably be slower, but for bubbles it could be faster. But making an specific case for it would require a lot more code.
Your IZI macro is also pointless. It's only useful in my code because I redefine it before calling the macros used for drawing the pixels. I did it that way so the same pixel drawing macros could be used for both SPR_VP_PARALLEL and else paths. But you are using different pixel drawing macros for each path.
This will be faster when the texture has more transparent pixels than opaque pixels, and when there isn't anything in front of it.qbism wrote:In a certain stress case, this:turns out to be faster than this:Code: Select all
#define PARALLELCHECK(i) { btemp = *(pbase + (s >> 16) + (t >> 16) * cachewidth); if (btemp != 255 && (pz[i] <= IZI)) { pz[i] = IZI; pdest[i] = btemp;} s+=sstep; t+=tstep;} #define ORIENTEDCHECK(i) { btemp = *(pbase + (s >> 16) + (t >> 16) * cachewidth); if (btemp != 255 && pz[i] <= (izi >> 16)){ pz[i] = izi >> 16; pdest[i] = btemp;} s+=sstep; t+=tstep; izi+=izistep;}
Besides pulling out static variables it's the only change that made a dent in the fps counter.Code: Select all
#define PARALLELCHECK(i) {if (pz[i] <= IZI){ btemp = *(pbase + (s >> 16) + (t >> 16) * cachewidth); if (btemp != 255) { pz[i] = IZI; pdest[i] = btemp;}} s+=sstep; t+=tstep;} #define ORIENTEDCHECK(i) {if (pz[i] <= (izi >> 16)){ btemp = *(pbase + (s >> 16) + (t >> 16) * cachewidth); if (btemp != 255){ pz[i] = izi >> 16; pdest[i] = btemp;}} s+=sstep; t+=tstep; izi+=izistep;}
For grenade explosions it would probably be slower, but for bubbles it could be faster. But making an specific case for it would require a lot more code.
Re: Subdiv16 for sprites?
OK, I see it now Pulling z, zi, izi recomputation out of the loop is a big improvement, 5%+ in the test case.
Re: Subdiv16 for sprites?
No problem.
Here's another idea I just had: killing the perspective correction, since parallel surfaces doesn't need it.
You may need to pad the texture though. I've already done that to eliminate clamping on kernel-filtered dithering, and it's necessary to prevent under/over stepping.
For model-based particles, your idea of checking for transparency before checking the depth should also give more speed, since most of the particles should be rendered in mid-air, with no obstructions.
Here's another idea I just had: killing the perspective correction, since parallel surfaces doesn't need it.
Code: Select all
void D_SpriteDrawSpans_Dithered_Blend (void)
{
// mankrip - begin
if (psprite->type == SPR_VP_PARALLEL)
{
// for square surfaces on parallel perspective, all spans have the same size
int
countstart = pspan->count >> 4
;
count = countstart;
spancount = pspan->count % 16;
spancountminus1 = (float) (spancount - 1);
// izistep, d_zistepu and d_zistepv are zero, so we can skip zi
z = (float)0x10000 / d_ziorigin; // prescale to 16.16 fixed-point
// we count on FP exceptions being turned off to avoid range problems
izi = (int) (d_ziorigin * 0x8000 * 0x10000) >> 16;
#undef IZI
#define IZI izi
// calculate the initial s/z and t/z
sdivz = d_sdivzorigin + (float) (pspan->u) * d_sdivzstepu;
tdivz = d_tdivzorigin + (float) (pspan->v) * d_tdivzstepv;
// calculate s/z, t/z, zi->fixed s and t at far end of span,
// calculate s and t steps across span by shifting
sstep = ( ( (int) ( (sdivz + sdivzstepu) * z) + sadjust) - ( (int) (sdivz * z) + sadjust)) >> 4;
tstep = ( ( (int) ( (tdivz + tdivzstepu) * z) + tadjust) - ( (int) (tdivz * z) + tadjust)) >> 4;
do
{
u = pspan->u;
v = pspan->v;
// calculate the initial s/z, t/z, 1/z, s, and t and clamp
sdivz = d_sdivzorigin + (float)u * d_sdivzstepu;
tdivz = d_tdivzorigin + (float)v * d_tdivzstepv;
// mankrip - end
s = (int) (sdivz * z) + sadjust;
if (s > bbextents)
s = bbextents;
else if (s < 0)
s = 0;
t = (int) (tdivz * z) + tadjust;
if (t > bbextentt)
t = bbextentt;
else if (t < 0)
t = 0;
// mankrip - begin
pdest = (byte *)d_viewbuffer + (screenwidth * v) + u;
pz = d_pzbuffer + (d_zwidth * v) + u;
Y = v & 1;
count = countstart;
if (count)
{
// prepare dither values
X = ! ( (v + u) & 1);
XY0a = dither_kernel[X][Y][0];
XY1a = dither_kernel[X][Y][1];
XY0b = dither_kernel[!X][Y][0];
XY1b = dither_kernel[!X][Y][1];
while (count--)
{
pdest += 16;
pz += 16;
DITHERED_BLEND_A(-16); s += sstep; t += tstep;
DITHERED_BLEND_B(-15); s += sstep; t += tstep;
DITHERED_BLEND_A(-14); s += sstep; t += tstep;
DITHERED_BLEND_B(-13); s += sstep; t += tstep;
DITHERED_BLEND_A(-12); s += sstep; t += tstep;
DITHERED_BLEND_B(-11); s += sstep; t += tstep;
DITHERED_BLEND_A(-10); s += sstep; t += tstep;
DITHERED_BLEND_B( -9); s += sstep; t += tstep;
DITHERED_BLEND_A( -8); s += sstep; t += tstep;
DITHERED_BLEND_B( -7); s += sstep; t += tstep;
DITHERED_BLEND_A( -6); s += sstep; t += tstep;
DITHERED_BLEND_B( -5); s += sstep; t += tstep;
DITHERED_BLEND_A( -4); s += sstep; t += tstep;
DITHERED_BLEND_B( -3); s += sstep; t += tstep;
DITHERED_BLEND_A( -2); s += sstep; t += tstep;
DITHERED_BLEND_B( -1); s += sstep; t += tstep;
}
}
if (spancount)
{
// prepare dither values
X = (v + u) & 1;
XY0a = dither_kernel[X][Y][0];
XY1a = dither_kernel[X][Y][1];
XY0b = dither_kernel[!X][Y][0];
XY1b = dither_kernel[!X][Y][1];
pdest += spancount;
pz += spancount;
switch (spancount)
{
case 16: DITHERED_BLEND_A(-16); s += sstep; t += tstep;
case 15: DITHERED_BLEND_B(-15); s += sstep; t += tstep;
case 14: DITHERED_BLEND_A(-14); s += sstep; t += tstep;
case 13: DITHERED_BLEND_B(-13); s += sstep; t += tstep;
case 12: DITHERED_BLEND_A(-12); s += sstep; t += tstep;
case 11: DITHERED_BLEND_B(-11); s += sstep; t += tstep;
case 10: DITHERED_BLEND_A(-10); s += sstep; t += tstep;
case 9: DITHERED_BLEND_B( -9); s += sstep; t += tstep;
case 8: DITHERED_BLEND_A( -8); s += sstep; t += tstep;
case 7: DITHERED_BLEND_B( -7); s += sstep; t += tstep;
case 6: DITHERED_BLEND_A( -6); s += sstep; t += tstep;
case 5: DITHERED_BLEND_B( -5); s += sstep; t += tstep;
case 4: DITHERED_BLEND_A( -4); s += sstep; t += tstep;
case 3: DITHERED_BLEND_B( -3); s += sstep; t += tstep;
case 2: DITHERED_BLEND_A( -2); s += sstep; t += tstep;
case 1: DITHERED_BLEND_B( -1);
break;
}
}
// mankrip - end
pspan++;
} while (pspan->count != DS_SPAN_LIST_END);
}
else
[...]
}
For model-based particles, your idea of checking for transparency before checking the depth should also give more speed, since most of the particles should be rendered in mid-air, with no obstructions.
Re: Subdiv16 for sprites?
It would be interesting to see best checking strategy for large transparent sprites scattered throughout a scene (smoke, rocket trails, etc.)
Re: Subdiv16 for sprites?
The fastest way would be to throw away all of the SPR rendering code, transform and project only 3 of the 4 vertexes (the top ones, plus the left-bottom one), and use those coordinates to render the texture as a 2D image (Makaqu's code for drawing the console background could be perfect for this) while checking & filling the Z buffer.
Combined with Makaqu's single-pixel Z buffer check for particles, it would be really fast. Not as fast as Makaqu's particles (which are faster than Abrash's x86 ASM particles), but good enough to draw many textured particles on the screen.
Combined with Makaqu's single-pixel Z buffer check for particles, it would be really fast. Not as fast as Makaqu's particles (which are faster than Abrash's x86 ASM particles), but good enough to draw many textured particles on the screen.
Re: Subdiv16 for sprites?
I meant to "throw away" for parallel SPR models only. Oriented models would still need to be rendered in the old way.
Re: Subdiv16 for sprites?
That screenshot looks awesome. That's how blood stains should look like in any Quake engine. Perfect!
Improve Quaddicted, send me a pull request: https://github.com/SpiritQuaddicted/Quaddicted-reviews