(Theory) Quake renderer efficiencies

Discuss programming topics for the various GPL'd game engine sources.
Post Reply
mh
Posts: 2292
Joined: Sat Jan 12, 2008 1:38 am

(Theory) Quake renderer efficiencies

Post by mh »

Some of the things I've learned while working on DirectQ 1.8; most of this is actually not theory but up and running in a real life Quake engine that you can download from the link in my sig.

Difficulties with BSP

Received wisdom is that BSP cannot handle large open or complex areas well, that the Quake engine cannot handle these areas. This seems to be borne out by practical experience with maps and engines. Following the release of the Winter MP Remake Quake demo, and discussions of the performance of the central arena area in the ctf1rq map, I decided to see if I could tackle this head-on.

The main problem with Quake BSPs in this context is that they fragment surfaces far too heavily. A surface that looks like a single continuous area on-screen may actually be composed of 20 or more tiny polygons. Ideally the BSP would also contain the original brush data as it was in the map file so that we could get less fragmentation at run time, but we're stuck with what we have.

Steps to tackle the problem

Most Quake technology engines render one polygon at a time. 3D graphics is on the other hand a lot like hard disks, memory and networks: submitting a small number of very large batches is always more efficient than submitting a large number of very small batches.

The approach here is to take the raw data for each polygon and group together surfaces which share the same texture and lightmap, then submit them all to the renderer in one go.

The first step to achieve this is to ensure that surfaces that share the same texture are more likely to also share the same lightmap. Sorting surfaces by texture before generating lightmaps was the chosen method (I just reused the texturechains for this), as well as using a tall-but-narrow lightmap size. Width is the determining factor in dynamic lightmap performance, and Quake needs this to be a minimum of 18, so the chosen lightmap width was 32. The lightmap height was just set to whatever your card's max texture size is.

For grouping surfaces it's necessary to break them down into individual triangles, as using polygon/trianglefan/trianglestrip will break the grouping. A GL_POLYGON or GL_TRIANGLE_FAN breaks down to triangles by using the vertexes 0, 1, 2, 0, 2, 3, 0, 3, 4, 0, 4, 5, etc (strips are more fiddly, reversing the winding order every alternate triangle).

Even gathering all surfaces into a fixed size vertex array as they pass, then when the vertex array is full or the textures change submitting them in a single batch using glDrawArrays and GL_TRIANGLES is far more efficient than drawing each individually.

It can be improved further by using indexed arrays. For the example above, we're submitting 12 vertexes. Using the indexed approach we only need to submit 5 vertexes and 12 indexes.

Outcome

Using this approach we can potentially submit hundreds of surfaces to the 3D hardware at a time. This makes more optimal use of the 3D hardware's bandwidth, and gives a very substantial improvement. The ctf1rq map I talked about at the start will go from framerates in the 20s up to 50, 60 or even the full 72.

If you can set it up to group your brushmodel surfaces together with your worldmodel surfaces it gets even better. You'll have to implement matrix transforms in software to do this, but the trade-off is worth it.

Implementation

Even though my engine is D3D, these techniques can also be fully applied in OpenGL using Vertex Arrays. Vertex Arrays are supported in all OpenGL versions since 1.1, you don't need to do any extension or function pointer trickery to use them, and even Microsoft's software OpenGL supports them.

You can avoid the memory copy above by using VBOs, but even with it the end result is more capable than GLQuake's default renderer.

Sample Code

The code given here can be plugged almost directly into ID GLQuake with a minimum of modification. It implements the rendering portion of this discussion using indexed vertex arrays, and you can see the improvement for yourself. For simplicity I've assumed that multitexture isn't used, and have just used large static arrays.

To get the best results you'll also need to sort your lightmaps so that surfaces which have the same texture share the same lightmap. Just use the texturechains for this as I've said above (although without multitexture you don't even need this step).

Code: Select all

typedef struct polyvert_s
{
	float v[3];
	float st[2];
} polyvert_t;

polyvert_t polyverts[60000];

unsigned short polyindexes[60000];

int numpolyverts = 0;
int numpolyindexes = 0;

int currentpolytexture = 0;
int cachedpolytexture = -1;

void GL_BeginPolys (void)
{
	numpolyverts = 0;
	numpolyindexes = 0;
	cachedpolytexture = -1;

	glEnableClientState (GL_VERTEX_ARRAY);
	glEnableClientState (GL_TEXTURE_COORD_ARRAY);

	glVertexPointer (3, GL_FLOAT, sizeof (polyvert_t), polyverts->v);
	glTexCoordPointer (2, GL_FLOAT, sizeof (polyvert_t), polyverts->st);
}


void GL_EndPolys (void)
{
	if (numpolyverts || numpolyindexes)
	{
		// draw anything left over
		glDrawElements (GL_TRIANGLES, numpolyindexes, GL_UNSIGNED_SHORT, polyindexes);
		c_brush_polys++;
		numpolyverts = 0;
		numpolyindexes = 0;
	}

	glDisableClientState (GL_VERTEX_ARRAY);
	glDisableClientState (GL_TEXTURE_COORD_ARRAY);
}


void DrawGLPoly (glpoly_t *p, int s, int t)
{
	int		i;
	float	*v;

	if (numpolyverts + p->numverts >= 60000 || numpolyindexes + (p->numverts - 2) * 3 >= 60000 || cachedpolytexture != currentpolytexture)
	{
		glDrawElements (GL_TRIANGLES, numpolyindexes, GL_UNSIGNED_SHORT, polyindexes);
		c_brush_polys++;
		numpolyverts = 0;
		numpolyindexes = 0;

		GL_Bind (currentpolytexture);
		cachedpolytexture = currentpolytexture;
	}

	// emit indexes
	for (i = 2; i < p->numverts; i++)
	{
		polyindexes[numpolyindexes++] = numpolyverts;
		polyindexes[numpolyindexes++] = numpolyverts + i - 1;
		polyindexes[numpolyindexes++] = numpolyverts + i;
	}

	// emit vertextes
	for (i = 0, v = p->verts[0]; i < p->numverts; i++, v += VERTEXSIZE, numpolyverts++)
	{
		polyverts[numpolyverts].v[0] = v[0];
		polyverts[numpolyverts].v[1] = v[1];
		polyverts[numpolyverts].v[2] = v[2];
		polyverts[numpolyverts].st[0] = v[s];
		polyverts[numpolyverts].st[1] = v[t];
	}
}
You would call GL_BeginPolys at the start of your DrawTextureChains function, GL_EndPolys at the end of it, and DrawGLPoly for each polygon. DrawGLPoly takes 2 extra parameters: the indexes of s and t in the polygon, which should be 3 and 4 for textures or 5 and 6 for lightmaps.

You also need to set currentpolytexture to the gl_texturenum of the texture you're using instead of calling GL_Bind.
We had the power, we had the space, we had a sense of time and place
We knew the words, we knew the score, we knew what we were fighting for
Post Reply