Page 2 of 2

Re: Subdiv16 for sprites?

Posted: Wed Sep 18, 2013 12:12 am
by qbism
Saves a lot of retyping and scrolling. Just good macromantics.

Re: Subdiv16 for sprites?

Posted: Wed Sep 18, 2013 3:42 pm
by leileilol
Indeed, and it helped me trim 62,344 bytes from r_surf.c :P

now I feel like making entire functions out of macros!

Re: Subdiv16 for sprites?

Posted: Wed Sep 18, 2013 10:40 pm
by mankrip
qbism, grab my D_SpriteDrawSpans16_ColorKeyed function, paste the code of the if (psprite->type == SPR_VP_PARALLEL) path in one file, the code of the else path in another file, and run a diff tool to compare both files, so you can see exactly what was changed.

Your IZI macro is also pointless. It's only useful in my code because I redefine it before calling the macros used for drawing the pixels. I did it that way so the same pixel drawing macros could be used for both SPR_VP_PARALLEL and else paths. But you are using different pixel drawing macros for each path.
qbism wrote:In a certain stress case, this:

Code: Select all

#define PARALLELCHECK(i) { btemp = *(pbase + (s >> 16) + (t >> 16) * cachewidth); if (btemp != 255 && (pz[i] <= IZI))  { pz[i] = IZI; pdest[i] = btemp;} s+=sstep; t+=tstep;}
#define ORIENTEDCHECK(i) { btemp = *(pbase + (s >> 16) + (t >> 16) * cachewidth); if (btemp != 255 && pz[i] <= (izi >> 16)){ pz[i] = izi >> 16; pdest[i] = btemp;} s+=sstep; t+=tstep; izi+=izistep;}
turns out to be faster than this:

Code: Select all

#define PARALLELCHECK(i) {if (pz[i] <= IZI){ btemp = *(pbase + (s >> 16) + (t >> 16) * cachewidth); if (btemp != 255)  { pz[i] = IZI; pdest[i] = btemp;}} s+=sstep; t+=tstep;}
#define ORIENTEDCHECK(i) {if (pz[i] <= (izi >> 16)){ btemp = *(pbase + (s >> 16) + (t >> 16) * cachewidth); if (btemp != 255){ pz[i] = izi >> 16; pdest[i] = btemp;}} s+=sstep; t+=tstep; izi+=izistep;}
Besides pulling out static variables it's the only change that made a dent in the fps counter.
This will be faster when the texture has more transparent pixels than opaque pixels, and when there isn't anything in front of it.

For grenade explosions it would probably be slower, but for bubbles it could be faster. But making an specific case for it would require a lot more code.

Re: Subdiv16 for sprites?

Posted: Fri Sep 20, 2013 2:55 am
by qbism
OK, I see it now :oops: Pulling z, zi, izi recomputation out of the loop is a big improvement, 5%+ in the test case.

Re: Subdiv16 for sprites?

Posted: Fri Sep 20, 2013 4:09 am
by mankrip
:) No problem.

Here's another idea I just had: killing the perspective correction, since parallel surfaces doesn't need it.

Code: Select all

void D_SpriteDrawSpans_Dithered_Blend (void)
{
	// mankrip - begin
	if (psprite->type == SPR_VP_PARALLEL)
	{
		// for square surfaces on parallel perspective, all spans have the same size
		int
			countstart = pspan->count >> 4
			;
		count		= countstart;
		spancount	= pspan->count % 16;
		spancountminus1 = (float) (spancount - 1);

		//	izistep, d_zistepu and d_zistepv are zero, so we can skip zi
		z = (float)0x10000 / d_ziorigin; // prescale to 16.16 fixed-point
		// we count on FP exceptions being turned off to avoid range problems
		izi = (int) (d_ziorigin * 0x8000 * 0x10000) >> 16;
		#undef IZI
		#define IZI izi

		// calculate the initial s/z and t/z
		sdivz = d_sdivzorigin + (float) (pspan->u) * d_sdivzstepu;
		tdivz = d_tdivzorigin + (float) (pspan->v) * d_tdivzstepv;

		// calculate s/z, t/z, zi->fixed s and t at far end of span,
		// calculate s and t steps across span by shifting
		sstep = ( ( (int) ( (sdivz + sdivzstepu) * z) + sadjust) - ( (int) (sdivz * z) + sadjust)) >> 4;
		tstep = ( ( (int) ( (tdivz + tdivzstepu) * z) + tadjust) - ( (int) (tdivz * z) + tadjust)) >> 4;

		do
		{
			u = pspan->u;
			v = pspan->v;

			// calculate the initial s/z, t/z, 1/z, s, and t and clamp
			sdivz = d_sdivzorigin + (float)u * d_sdivzstepu;
			tdivz = d_tdivzorigin + (float)v * d_tdivzstepv;
	// mankrip - end

			s = (int) (sdivz * z) + sadjust;
			if (s > bbextents)
				s = bbextents;
			else if (s < 0)
				s = 0;

			t = (int) (tdivz * z) + tadjust;
			if (t > bbextentt)
				t = bbextentt;
			else if (t < 0)
				t = 0;

			// mankrip - begin
			pdest = (byte *)d_viewbuffer + (screenwidth * v) + u;
			pz = d_pzbuffer + (d_zwidth * v) + u;

			Y = v & 1;
			count = countstart;
			if (count)
			{
				// prepare dither values
				X = ! ( (v + u) & 1);
				XY0a = dither_kernel[X][Y][0];
				XY1a = dither_kernel[X][Y][1];
				XY0b = dither_kernel[!X][Y][0];
				XY1b = dither_kernel[!X][Y][1];

				while (count--)
				{
					pdest += 16;
					pz += 16;
					DITHERED_BLEND_A(-16); s += sstep; t += tstep;
					DITHERED_BLEND_B(-15); s += sstep; t += tstep;
					DITHERED_BLEND_A(-14); s += sstep; t += tstep;
					DITHERED_BLEND_B(-13); s += sstep; t += tstep;
					DITHERED_BLEND_A(-12); s += sstep; t += tstep;
					DITHERED_BLEND_B(-11); s += sstep; t += tstep;
					DITHERED_BLEND_A(-10); s += sstep; t += tstep;
					DITHERED_BLEND_B( -9); s += sstep; t += tstep;
					DITHERED_BLEND_A( -8); s += sstep; t += tstep;
					DITHERED_BLEND_B( -7); s += sstep; t += tstep;
					DITHERED_BLEND_A( -6); s += sstep; t += tstep;
					DITHERED_BLEND_B( -5); s += sstep; t += tstep;
					DITHERED_BLEND_A( -4); s += sstep; t += tstep;
					DITHERED_BLEND_B( -3); s += sstep; t += tstep;
					DITHERED_BLEND_A( -2); s += sstep; t += tstep;
					DITHERED_BLEND_B( -1); s += sstep; t += tstep;
				}
			}
			if (spancount)
			{
				// prepare dither values
				X = (v + u) & 1;
				XY0a = dither_kernel[X][Y][0];
				XY1a = dither_kernel[X][Y][1];
				XY0b = dither_kernel[!X][Y][0];
				XY1b = dither_kernel[!X][Y][1];

				pdest += spancount;
				pz += spancount;
				switch (spancount)
				{
					case 16: DITHERED_BLEND_A(-16); s += sstep; t += tstep;
					case 15: DITHERED_BLEND_B(-15); s += sstep; t += tstep;
					case 14: DITHERED_BLEND_A(-14); s += sstep; t += tstep;
					case 13: DITHERED_BLEND_B(-13); s += sstep; t += tstep;
					case 12: DITHERED_BLEND_A(-12); s += sstep; t += tstep;
					case 11: DITHERED_BLEND_B(-11); s += sstep; t += tstep;
					case 10: DITHERED_BLEND_A(-10); s += sstep; t += tstep;
					case  9: DITHERED_BLEND_B( -9); s += sstep; t += tstep;
					case  8: DITHERED_BLEND_A( -8); s += sstep; t += tstep;
					case  7: DITHERED_BLEND_B( -7); s += sstep; t += tstep;
					case  6: DITHERED_BLEND_A( -6); s += sstep; t += tstep;
					case  5: DITHERED_BLEND_B( -5); s += sstep; t += tstep;
					case  4: DITHERED_BLEND_A( -4); s += sstep; t += tstep;
					case  3: DITHERED_BLEND_B( -3); s += sstep; t += tstep;
					case  2: DITHERED_BLEND_A( -2); s += sstep; t += tstep;
					case  1: DITHERED_BLEND_B( -1);
					break;
				}
			}
			// mankrip - end
			pspan++;
		} while (pspan->count != DS_SPAN_LIST_END);
	}
	else
[...]
}
You may need to pad the texture though. I've already done that to eliminate clamping on kernel-filtered dithering, and it's necessary to prevent under/over stepping.

For model-based particles, your idea of checking for transparency before checking the depth should also give more speed, since most of the particles should be rendered in mid-air, with no obstructions.

Re: Subdiv16 for sprites?

Posted: Fri Sep 20, 2013 1:14 pm
by qbism
It would be interesting to see best checking strategy for large transparent sprites scattered throughout a scene (smoke, rocket trails, etc.)

Re: Subdiv16 for sprites?

Posted: Sat Sep 21, 2013 1:12 am
by mankrip
The fastest way would be to throw away all of the SPR rendering code, transform and project only 3 of the 4 vertexes (the top ones, plus the left-bottom one), and use those coordinates to render the texture as a 2D image (Makaqu's code for drawing the console background could be perfect for this) while checking & filling the Z buffer.

Combined with Makaqu's single-pixel Z buffer check for particles, it would be really fast. Not as fast as Makaqu's particles (which are faster than Abrash's x86 ASM particles), but good enough to draw many textured particles on the screen.

Re: Subdiv16 for sprites?

Posted: Sat Sep 21, 2013 12:11 pm
by leileilol
at least half of my sprites are oriented though.

Image

Re: Subdiv16 for sprites?

Posted: Sat Sep 21, 2013 7:17 pm
by mankrip
I meant to "throw away" for parallel SPR models only. Oriented models would still need to be rendered in the old way.

Re: Subdiv16 for sprites?

Posted: Mon Sep 23, 2013 9:53 am
by Spirit
That screenshot looks awesome. That's how blood stains should look like in any Quake engine. Perfect!

Re: Subdiv16 for sprites?

Posted: Fri Sep 27, 2013 10:12 pm
by mankrip
Agreed. They would be much faster to render if rendered on the surface cache, though.