Subdiv16 for sprites?

Discuss programming topics for the various GPL'd game engine sources.
qbism
Posts: 1236
Joined: Thu Nov 04, 2004 5:51 am
Contact:

Re: Subdiv16 for sprites?

Post by qbism »

Saves a lot of retyping and scrolling. Just good macromantics.
leileilol
Posts: 2783
Joined: Fri Oct 15, 2004 3:23 am

Re: Subdiv16 for sprites?

Post by leileilol »

Indeed, and it helped me trim 62,344 bytes from r_surf.c :P

now I feel like making entire functions out of macros!
i should not be here
mankrip
Posts: 924
Joined: Fri Jul 04, 2008 3:02 am

Re: Subdiv16 for sprites?

Post by mankrip »

qbism, grab my D_SpriteDrawSpans16_ColorKeyed function, paste the code of the if (psprite->type == SPR_VP_PARALLEL) path in one file, the code of the else path in another file, and run a diff tool to compare both files, so you can see exactly what was changed.

Your IZI macro is also pointless. It's only useful in my code because I redefine it before calling the macros used for drawing the pixels. I did it that way so the same pixel drawing macros could be used for both SPR_VP_PARALLEL and else paths. But you are using different pixel drawing macros for each path.
qbism wrote:In a certain stress case, this:

Code: Select all

#define PARALLELCHECK(i) { btemp = *(pbase + (s >> 16) + (t >> 16) * cachewidth); if (btemp != 255 && (pz[i] <= IZI))  { pz[i] = IZI; pdest[i] = btemp;} s+=sstep; t+=tstep;}
#define ORIENTEDCHECK(i) { btemp = *(pbase + (s >> 16) + (t >> 16) * cachewidth); if (btemp != 255 && pz[i] <= (izi >> 16)){ pz[i] = izi >> 16; pdest[i] = btemp;} s+=sstep; t+=tstep; izi+=izistep;}
turns out to be faster than this:

Code: Select all

#define PARALLELCHECK(i) {if (pz[i] <= IZI){ btemp = *(pbase + (s >> 16) + (t >> 16) * cachewidth); if (btemp != 255)  { pz[i] = IZI; pdest[i] = btemp;}} s+=sstep; t+=tstep;}
#define ORIENTEDCHECK(i) {if (pz[i] <= (izi >> 16)){ btemp = *(pbase + (s >> 16) + (t >> 16) * cachewidth); if (btemp != 255){ pz[i] = izi >> 16; pdest[i] = btemp;}} s+=sstep; t+=tstep; izi+=izistep;}
Besides pulling out static variables it's the only change that made a dent in the fps counter.
This will be faster when the texture has more transparent pixels than opaque pixels, and when there isn't anything in front of it.

For grenade explosions it would probably be slower, but for bubbles it could be faster. But making an specific case for it would require a lot more code.
Ph'nglui mglw'nafh mankrip Hell's end wgah'nagl fhtagn.
==-=-=-=-=-=-=-=-=-=-==
Dev blog / Twitter / YouTube
qbism
Posts: 1236
Joined: Thu Nov 04, 2004 5:51 am
Contact:

Re: Subdiv16 for sprites?

Post by qbism »

OK, I see it now :oops: Pulling z, zi, izi recomputation out of the loop is a big improvement, 5%+ in the test case.
mankrip
Posts: 924
Joined: Fri Jul 04, 2008 3:02 am

Re: Subdiv16 for sprites?

Post by mankrip »

:) No problem.

Here's another idea I just had: killing the perspective correction, since parallel surfaces doesn't need it.

Code: Select all

void D_SpriteDrawSpans_Dithered_Blend (void)
{
	// mankrip - begin
	if (psprite->type == SPR_VP_PARALLEL)
	{
		// for square surfaces on parallel perspective, all spans have the same size
		int
			countstart = pspan->count >> 4
			;
		count		= countstart;
		spancount	= pspan->count % 16;
		spancountminus1 = (float) (spancount - 1);

		//	izistep, d_zistepu and d_zistepv are zero, so we can skip zi
		z = (float)0x10000 / d_ziorigin; // prescale to 16.16 fixed-point
		// we count on FP exceptions being turned off to avoid range problems
		izi = (int) (d_ziorigin * 0x8000 * 0x10000) >> 16;
		#undef IZI
		#define IZI izi

		// calculate the initial s/z and t/z
		sdivz = d_sdivzorigin + (float) (pspan->u) * d_sdivzstepu;
		tdivz = d_tdivzorigin + (float) (pspan->v) * d_tdivzstepv;

		// calculate s/z, t/z, zi->fixed s and t at far end of span,
		// calculate s and t steps across span by shifting
		sstep = ( ( (int) ( (sdivz + sdivzstepu) * z) + sadjust) - ( (int) (sdivz * z) + sadjust)) >> 4;
		tstep = ( ( (int) ( (tdivz + tdivzstepu) * z) + tadjust) - ( (int) (tdivz * z) + tadjust)) >> 4;

		do
		{
			u = pspan->u;
			v = pspan->v;

			// calculate the initial s/z, t/z, 1/z, s, and t and clamp
			sdivz = d_sdivzorigin + (float)u * d_sdivzstepu;
			tdivz = d_tdivzorigin + (float)v * d_tdivzstepv;
	// mankrip - end

			s = (int) (sdivz * z) + sadjust;
			if (s > bbextents)
				s = bbextents;
			else if (s < 0)
				s = 0;

			t = (int) (tdivz * z) + tadjust;
			if (t > bbextentt)
				t = bbextentt;
			else if (t < 0)
				t = 0;

			// mankrip - begin
			pdest = (byte *)d_viewbuffer + (screenwidth * v) + u;
			pz = d_pzbuffer + (d_zwidth * v) + u;

			Y = v & 1;
			count = countstart;
			if (count)
			{
				// prepare dither values
				X = ! ( (v + u) & 1);
				XY0a = dither_kernel[X][Y][0];
				XY1a = dither_kernel[X][Y][1];
				XY0b = dither_kernel[!X][Y][0];
				XY1b = dither_kernel[!X][Y][1];

				while (count--)
				{
					pdest += 16;
					pz += 16;
					DITHERED_BLEND_A(-16); s += sstep; t += tstep;
					DITHERED_BLEND_B(-15); s += sstep; t += tstep;
					DITHERED_BLEND_A(-14); s += sstep; t += tstep;
					DITHERED_BLEND_B(-13); s += sstep; t += tstep;
					DITHERED_BLEND_A(-12); s += sstep; t += tstep;
					DITHERED_BLEND_B(-11); s += sstep; t += tstep;
					DITHERED_BLEND_A(-10); s += sstep; t += tstep;
					DITHERED_BLEND_B( -9); s += sstep; t += tstep;
					DITHERED_BLEND_A( -8); s += sstep; t += tstep;
					DITHERED_BLEND_B( -7); s += sstep; t += tstep;
					DITHERED_BLEND_A( -6); s += sstep; t += tstep;
					DITHERED_BLEND_B( -5); s += sstep; t += tstep;
					DITHERED_BLEND_A( -4); s += sstep; t += tstep;
					DITHERED_BLEND_B( -3); s += sstep; t += tstep;
					DITHERED_BLEND_A( -2); s += sstep; t += tstep;
					DITHERED_BLEND_B( -1); s += sstep; t += tstep;
				}
			}
			if (spancount)
			{
				// prepare dither values
				X = (v + u) & 1;
				XY0a = dither_kernel[X][Y][0];
				XY1a = dither_kernel[X][Y][1];
				XY0b = dither_kernel[!X][Y][0];
				XY1b = dither_kernel[!X][Y][1];

				pdest += spancount;
				pz += spancount;
				switch (spancount)
				{
					case 16: DITHERED_BLEND_A(-16); s += sstep; t += tstep;
					case 15: DITHERED_BLEND_B(-15); s += sstep; t += tstep;
					case 14: DITHERED_BLEND_A(-14); s += sstep; t += tstep;
					case 13: DITHERED_BLEND_B(-13); s += sstep; t += tstep;
					case 12: DITHERED_BLEND_A(-12); s += sstep; t += tstep;
					case 11: DITHERED_BLEND_B(-11); s += sstep; t += tstep;
					case 10: DITHERED_BLEND_A(-10); s += sstep; t += tstep;
					case  9: DITHERED_BLEND_B( -9); s += sstep; t += tstep;
					case  8: DITHERED_BLEND_A( -8); s += sstep; t += tstep;
					case  7: DITHERED_BLEND_B( -7); s += sstep; t += tstep;
					case  6: DITHERED_BLEND_A( -6); s += sstep; t += tstep;
					case  5: DITHERED_BLEND_B( -5); s += sstep; t += tstep;
					case  4: DITHERED_BLEND_A( -4); s += sstep; t += tstep;
					case  3: DITHERED_BLEND_B( -3); s += sstep; t += tstep;
					case  2: DITHERED_BLEND_A( -2); s += sstep; t += tstep;
					case  1: DITHERED_BLEND_B( -1);
					break;
				}
			}
			// mankrip - end
			pspan++;
		} while (pspan->count != DS_SPAN_LIST_END);
	}
	else
[...]
}
You may need to pad the texture though. I've already done that to eliminate clamping on kernel-filtered dithering, and it's necessary to prevent under/over stepping.

For model-based particles, your idea of checking for transparency before checking the depth should also give more speed, since most of the particles should be rendered in mid-air, with no obstructions.
Ph'nglui mglw'nafh mankrip Hell's end wgah'nagl fhtagn.
==-=-=-=-=-=-=-=-=-=-==
Dev blog / Twitter / YouTube
qbism
Posts: 1236
Joined: Thu Nov 04, 2004 5:51 am
Contact:

Re: Subdiv16 for sprites?

Post by qbism »

It would be interesting to see best checking strategy for large transparent sprites scattered throughout a scene (smoke, rocket trails, etc.)
mankrip
Posts: 924
Joined: Fri Jul 04, 2008 3:02 am

Re: Subdiv16 for sprites?

Post by mankrip »

The fastest way would be to throw away all of the SPR rendering code, transform and project only 3 of the 4 vertexes (the top ones, plus the left-bottom one), and use those coordinates to render the texture as a 2D image (Makaqu's code for drawing the console background could be perfect for this) while checking & filling the Z buffer.

Combined with Makaqu's single-pixel Z buffer check for particles, it would be really fast. Not as fast as Makaqu's particles (which are faster than Abrash's x86 ASM particles), but good enough to draw many textured particles on the screen.
Ph'nglui mglw'nafh mankrip Hell's end wgah'nagl fhtagn.
==-=-=-=-=-=-=-=-=-=-==
Dev blog / Twitter / YouTube
leileilol
Posts: 2783
Joined: Fri Oct 15, 2004 3:23 am

Re: Subdiv16 for sprites?

Post by leileilol »

at least half of my sprites are oriented though.

Image
i should not be here
mankrip
Posts: 924
Joined: Fri Jul 04, 2008 3:02 am

Re: Subdiv16 for sprites?

Post by mankrip »

I meant to "throw away" for parallel SPR models only. Oriented models would still need to be rendered in the old way.
Ph'nglui mglw'nafh mankrip Hell's end wgah'nagl fhtagn.
==-=-=-=-=-=-=-=-=-=-==
Dev blog / Twitter / YouTube
Spirit
Posts: 1065
Joined: Sat Nov 20, 2004 9:00 pm
Contact:

Re: Subdiv16 for sprites?

Post by Spirit »

That screenshot looks awesome. That's how blood stains should look like in any Quake engine. Perfect!
Improve Quaddicted, send me a pull request: https://github.com/SpiritQuaddicted/Quaddicted-reviews
mankrip
Posts: 924
Joined: Fri Jul 04, 2008 3:02 am

Re: Subdiv16 for sprites?

Post by mankrip »

Agreed. They would be much faster to render if rendered on the surface cache, though.
Ph'nglui mglw'nafh mankrip Hell's end wgah'nagl fhtagn.
==-=-=-=-=-=-=-=-=-=-==
Dev blog / Twitter / YouTube
Post Reply