[GLQuake] Improved Water Warp

Post tutorials on how to do certain tasks within game or engine code here.
Post Reply
mh
Posts: 2292
Joined: Sat Jan 12, 2008 1:38 am

[GLQuake] Improved Water Warp

Post by mh »

Water warps in GLQuake suck. Here we're going to improve them quite dramatically, in a manner that doesn't require any functionality beyond the basic OpenGL 1.1 that GLQuake uses.

The trick here is to use a low subdivision but draw them faster. This will be almost, but not quite, as rock-solid as software Quake.

To keep code to a minimum here I'm not going to use vertex arrays. I'd suggest a migration to an indexed vertex array as an exercise for the individual. This would be the recommended approach and will get you substantial performance increases.

So here we go. All changes are in gl_warp.c

Go to the SubdividePolygon function and add this among the other variables at the start:

Code: Select all

float sdsize;
Just before the call to BoundPoly add this:

Code: Select all

	if (warpface->flags & SURF_DRAWTURB)
		sdsize = 24;
	else sdsize = gl_subdivide_size.value;
Now find this line:

Code: Select all

		m = gl_subdivide_size.value * floor (m / gl_subdivide_size.value + 0.5);
And change it to:

Code: Select all

		m = sdsize * floor (m / sdsize + 0.5);
Now for the renderer. Find your EmitWaterPolys function and change it to this:

Code: Select all

#define WARPCALC(s,t) ((s + turbsin[(int)((t*2)+(cl.time*(128.0/M_PI))) & 255]) * (1.0/64)) // correct warp from fitzquake

void EmitWarpVert (float *v)
{
	glTexCoord2f (WARPCALC (v[3], v[4]), WARPCALC (v[4], v[3]));
	glVertex3fv (v);
}


void EmitWaterPolys (msurface_t *fa)
{
	glpoly_t	*p;
	int			i;

	glBegin (GL_TRIANGLES);

	for (p = fa->polys; p; p = p->next)
	{
		for (i = 2; i < p->numverts; i++)
		{
			EmitWarpVert (p->verts[0]);
			EmitWarpVert (p->verts[i - 1]);
			EmitWarpVert (p->verts[i]);
		}
	}

	glEnd ();
}
What's going on here is that we're batching up our vertex submissions better. It's a simple fact of life that bigger batches of vertexes draw much much faster than smaller ones, meaning that we're able to subdivide water at a very low level and still retain all of our speed - I haven't done a direct head-to-head but I suspect that it may even be faster than GLQuake with a gl_subdivide_size of 128 or even 256.

Like I said, moving this to indexed vertex arrays would raise performance to another level again. That's for you to do. If you don't fancy doing that you can restructure things so that you're able to move your glBegin/glEnd to outside of the texture chain loop; then you'll really see some fireworks with it. Bonus marks for anyone who does something similar on world surfaces and lightmaps.

Credit to the mighty metlslime for the corrected warp define.
We had the power, we had the space, we had a sense of time and place
We knew the words, we knew the score, we knew what we were fighting for
leileilol
Posts: 2783
Joined: Fri Oct 15, 2004 3:23 am

Post by leileilol »

I will try this on my POS rigs to see if the improved 'warp does affect fps positively.

You might think "2ghz onboard intel". NO. think 233MHz...on POWERVR (which does 0% triangle setup on card and requires a powerful CPU to use).
i should not be here
Spike
Posts: 2914
Joined: Fri Nov 05, 2004 3:12 am
Location: UK
Contact:

Post by Spike »

The ideal way to do it is with a fragment program. Then you don't need to subdivide at all, and still get warping to the pixel. Tbh, using vertex arrays that way isn't that much of an extra speedup - glBegin will typically give good performance with small vertex counts
This also applies to skies. Cubemaps instead of 6 separate textures is also a nice way to reduce overdraw when using skyboxes.

leilei may disagree though, but if the hardware supports it, fragment programs are generally the way to go with these sw-mimicing effects.
metlslime
Posts: 316
Joined: Tue Feb 05, 2008 11:03 pm

Post by metlslime »

Should be worth mentioning that the subdivision code is itself flawed, as it can create t-junctions. If two polygons share an edge, but one is smaller than the subvidide threshold, it won't get subdivided, but the other one still might be.
mh
Posts: 2292
Joined: Sat Jan 12, 2008 1:38 am

Post by mh »

Spike wrote:The ideal way to do it is with a fragment program. Then you don't need to subdivide at all, and still get warping to the pixel. Tbh, using vertex arrays that way isn't that much of an extra speedup - glBegin will typically give good performance with small vertex counts
This also applies to skies. Cubemaps instead of 6 separate textures is also a nice way to reduce overdraw when using skyboxes.

leilei may disagree though, but if the hardware supports it, fragment programs are generally the way to go with these sw-mimicing effects.
Yeah, that's what I do in DirectQ, but I also use something like the above as a fallback. Likewise with the standard skywarp, and likewise with a cubemap for the skybox (that last one's new and not yet released).

The neat thing is that certain operations needn't be per-pixel. The sin for water (and the sqrt for sky) don't linearly interpolate (which is one of the reasons why things go to shit in GLQuake) and need to be per-pixel, but all of the ops before them can be per-vertex.

glBegin vs vertex arrays is in my experience a mixed bag. I have seen genuine performance increases from moving to vertex arrays with nice big batch sizes, and when the triangle count gets even moderately high (like in some recent Quake maps) you can get it running almost 10 times faster without too much effort. But on the other hand a lot of OpenGL implementations seem to do their own batching in the driver as well, so there's more worthwhile gains to be had from basic stuff like not doing texture changes or lightmap uploads per-surface.
We had the power, we had the space, we had a sense of time and place
We knew the words, we knew the score, we knew what we were fighting for
leileilol
Posts: 2783
Joined: Fri Oct 15, 2004 3:23 am

Post by leileilol »

timedemo demo1, bpp32, average fps:
Old code - 73fps
New code - 49fps

Much slower (really dips dm3.bsp's water area) but at least it looks nicer.
i should not be here
mh
Posts: 2292
Joined: Sat Jan 12, 2008 1:38 am

Post by mh »

Hmmm - looks like it does need vertex arrays then. Oh well, easy enough to do.
We had the power, we had the space, we had a sense of time and place
We knew the words, we knew the score, we knew what we were fighting for
metlslime
Posts: 316
Joined: Tue Feb 05, 2008 11:03 pm

Post by metlslime »

leileilol wrote:timedemo demo1, bpp32, average fps:
Old code - 73fps
New code - 49fps

Much slower (really dips dm3.bsp's water area) but at least it looks nicer.
It looks like the "new code" forces a subdivide size of 24. What gl_subdivide_size value did you use for the "old code" test?
leileilol
Posts: 2783
Joined: Fri Oct 15, 2004 3:23 am

Post by leileilol »

i didn't change that, so whatever glquake uses as default is used. Didn't use a stock release, i just compiled it fresh, also using the asm bits too
i should not be here
Post Reply