Page 1 of 1

GLQuake lightmap loading speedup

Posted: Wed Jun 04, 2014 5:59 pm
by ericw
I found an interesting bottleneck while testing some huge maps for the upcoming Rubicon Rumble Pack... when profiling Quakespasm, I saw the majority of the map loading time was spent in gl_rsurf.c:AllocBlock. In one map, it was about 3 seconds in that function!

What is the function doing?
GL_BuildLightmaps loops over all surfaces of all models, packing their lightmaps into an array of OpenGL textures. Many surfaces' lightmaps are packed into a single texture. AllocBlock is called once per surface, and it searches for a free spot in the OpenGL lightmap textures to store the given surface's lightmap.

The problem is, each time AllocBlock is called it starts searching at lightmap 0. Intuitively, once the first few lightmaps are full, it's wasteful to keep trying to fit new surfaces in them. My solution was, don't restart at the first lightmap, resume searching at the last lightmap we inserted a surface in to.

Here's the diff:

Code: Select all

 int			allocated[MAX_LIGHTMAPS][BLOCK_WIDTH];
+int			last_lightmap_allocated; //ericw -- optimization: remember the index of the last lightmap AllocBlock stored a surf in

Code: Select all

-	for (texnum=0 ; texnum<MAX_LIGHTMAPS ; texnum++)
+	// ericw -- rather than searching starting at lightmap 0 every time,
+	// start at the last lightmap we allocated a surface in.
+	// This makes AllocBlock much faster on large levels (can shave off 3+ seconds
+	// of load time on a level with 180 lightmaps), at a cost of not quite packing
+	// lightmaps as tightly vs. not doing this (uses ~5% more lightmaps)
+	for (texnum=last_lightmap_allocated ; texnum<MAX_LIGHTMAPS ; texnum++, last_lightmap_allocated++)

Code: Select all

 	memset (allocated, 0, sizeof(allocated));
+	last_lightmap_allocated = 0;
Now a bit of analysis: say your map has R surfaces and requires M lightmap textures. With the original code, on average, AllocBlock will have to search M/2 lightmaps for each time it is called before it finds the next free one. The outer for() loop in AllocBlock should run a total of R*(M/2) times then.

With the modified code, the outer for() loop in AllocBlock should usually only execute once per AllocBlock call, since usually we'll start on the lightmap that still has free space. So the total number of executions of this loop should just be R.

When I profiled the giant map again with the patch applied, time spent in AllocBlock was down to around 40ms! This agrees with the analysis above - the map needs 180 lightmaps, so the time spent in AllocBlock should be 180/2 = 90x less, which is about what I observed (40ms * 90 = 3600ms)

The one disadvantage is that the packing is slightly less optimal. Imagine that near the end of the list of surfaces, you have a small one that fits in a hole in one of the first few lightmap textures - the original code would find this hole and fill it, but the modified code won't. I don't think this is a big deal; I tried running a few maps and the increase in number of lightmaps required was usually 5%.

Originally submitted to Quakespasm: