Faster Dynamic Light Updates

Post tutorials on how to do certain tasks within game or engine code here.
Post Reply
mh
Posts: 2292
Joined: Sat Jan 12, 2008 1:38 am

Faster Dynamic Light Updates

Post by mh »

You may wish to have a look here for code to select the fastest formats and types for lightmaps: http://forums.inside3d.com/viewtopic.php?t=2465 ;)

Also, GLQuake updates the full lightmap width, so let's restrict the update to the part of it that actually changed instead:

Code: Select all

glPixelStorei (GL_UNPACK_ROW_LENGTH, BLOCK_WIDTH);

glTexSubImage2D
(
	GL_TEXTURE_2D,
	0,
	theRect->l,
	theRect->t,
	theRect->w,
	theRect->h,
	gl_Lightmap_Format,
	gl_Lightmap_Type,
	lightmaps[lnum] + theRect->t * BLOCK_WIDTH * 4 + theRect->l * 4
);
(Don't forget to restore GL_UNPACK_ROW_LENGTH to 0 when you're done!)
We had the power, we had the space, we had a sense of time and place
We knew the words, we knew the score, we knew what we were fighting for
Baker
Posts: 3666
Joined: Tue Mar 14, 2006 5:15 am

Re: Faster Dynamic Light Updates

Post by Baker »

mh wrote:You may wish to have a look here for code to select the fastest formats and types for lightmaps: http://forums.inside3d.com/viewtopic.php?t=2465 ;)

Also, GLQuake updates the full lightmap width, so let's restrict the update to the part of it that actually changed instead:

Code: Select all

glPixelStorei (GL_UNPACK_ROW_LENGTH, BLOCK_WIDTH);

glTexSubImage2D
(
	GL_TEXTURE_2D,
	0,
	theRect->l,
	theRect->t,
	theRect->w,
	theRect->h,
	gl_Lightmap_Format,
	gl_Lightmap_Type,
	lightmaps[lnum] + theRect->t * BLOCK_WIDTH * 4 + theRect->l * 4
);
(Don't forget to restore GL_UNPACK_ROW_LENGTH to 0 when you're done!)
Interesting and I'll do some experimentation with your code for sure.

One thing I will say, it is kind of a known issue that FitzQuake can suffer a bit from presumably dynamic lighting (in a way that, say, GLQuake or an engine like JoeQuake/Qrack doesn't suffer).

A number of people tried some experimental FitzQuake with ProQuake features builds I made when I initially started engine coding in 2007 and during real rough fights with a lot of rockets and/or lightning one second you are fighting ... the for a split second you lag ... and then the next thing you know is you are dead on the ground because some player killed you.

[The kind of circumstance this occurs in is generally 5 players really battling it out in the same room in 8-10 player game. I'm assuming what you are addressing is the same issue or closely related, but it could be something else like hardware palette flashes with gl_flashblend 1 and it has been 3 1/3 years since giving FitzQuake 0.80 some rough NetQuake multiplayer testing so my memory could be off.]

/And I could be entirely wrong about the specific details of the above because I don't have time to reproduce at the moment.
The night is young. How else can I annoy the world before sunsrise? 8) Inquisitive minds want to know ! And if they don't -- well like that ever has stopped me before ..
metlslime
Posts: 316
Joined: Tue Feb 05, 2008 11:03 pm

Post by metlslime »

fitzquake dynamic lighting is pretty much the same code as glquake.

However, lightmaps in both engines are uploaded "just in time" which means the order of polygon rendering determines the order of lightmap uploading.

I wonder if the changes in fitzquake world rendering have had the side effect of a less-optimal lightmap upload sequence.
metlslime
Posts: 316
Joined: Tue Feb 05, 2008 11:03 pm

Post by metlslime »

oh...

and of course the lightmaps in fitzquake are 24-bit instead of 8-bit (to support colored lighting) so even with the exact same behavior for everything else, there is 3x the data to upload.

Perhaps i need to put in conditional code so that maps without colored light can use an 8-bit format.
mh
Posts: 2292
Joined: Sat Jan 12, 2008 1:38 am

Post by mh »

To be honest, with the amount of bandwidth on todays hardware, uploading changes per surface can still give reasonable performance, and the format differences only really kick in when you encounter a troublesome driver like I had. (Aside: developing on bad hardware can sometimes be great for highlighting issues like this that might pass you by otherwise.)

The bigger difficulties come from use of GL_RGB format and from multiple texture changes.

GL_RGB is bad because no such format actually exists in hardware. Sending data down in GL_RGB format means that your driver has to make a copy of it, expand it to 4-component and most likely then swizzle it to GL_BGRA. You may as well just use GL_BGRA in your code instead and bypass those steps.

There's great info about this on OpenGL.org: http://www.opengl.org/wiki/Common_Mistakes

Multiple texture changes are not that bad in themselves as texture changes are fast these days. Where trouble kicks in is that your driver is unable to optimize your vertexes into bigger batches for you, so you end up doing lots and lots of itty bitty draw calls instead of very few big ones. Interestingly, with OpenGL this seems to be the case irrespective of whether you use glBegin/glEnd or vertex arrays so the driver must be doing some behind-the-scenes optimization of it's own.

Sorting surfaces by texture, then by lightmap within that is the way to go. Building lightmaps in texture order also helps, and increasing the lightmap size to 512x512 (so you get more surfs per lightmap, and a better chance that all surfs with the same texture also have the same lightmap) improves things again.

Do a single bulk upload of all modified lightmaps after the sorting pass but before the drawing pass, and draw something else or do some other CPU work before you draw the lightmapped surfaces so that the glTexSubImage2D calls have time to finish updating the textures before you need to use them and you're in business.

What really caught me by surprise was GL_UNSIGNED_BYTE versus GL_UNSIGNED_INT_8_8_8_8_REV. I'm guessing that on my bad driver GL_UNSIGNED_INT_8_8_8_8_REV is giving it a hint that "this data is already in the format you like best so there's no need to pull it back to system memory (or whatever it is you're doing) and have your evil way with it there". It was faster by a factor of 30 on this driver, and marginally edges out GL_UNSIGNED_BYTE on other, better drivers.

On the other hand GL_BGRA was just twice as fast as GL_RGBA, smaller but still significant.

Of course, the formats that were fastest on my driver and my platform aren't necessarily going to be the fastest elsewhere, which is why it's important to check a few of them.

The end result is able to handle scenes where wpoly counts go into the thousands, and where almost every surface has an animating lightmap, without dropping below 72 FPS. Which is nice. :D
We had the power, we had the space, we had a sense of time and place
We knew the words, we knew the score, we knew what we were fighting for
Baker
Posts: 3666
Joined: Tue Mar 14, 2006 5:15 am

Post by Baker »

metlslime wrote:fitzquake dynamic lighting is pretty much the same code as glquake.

However, lightmaps in both engines are uploaded "just in time" which means the order of polygon rendering determines the order of lightmap uploading.

I wonder if the changes in fitzquake world rendering have had the side effect of a less-optimal lightmap upload sequence.
I hesitated posting that without "proof" but the European NetQuake players know what I am referring to. I was far less knowledgeable 3 years ago.

In the coming weeks, I'll see if I can create an example of a demo that causes FitzQuake grief.

Don't get me wrong, your work on the renderer is awesome and I've always loved the FitzQuake philosophy after absorbing enough Func_Msgboard/listening to Spirit and aguirRe to get the idea --- which initially wasn't easy because I was JoeQuake effects and high res texture fanatic (part of me still is, but I began to understand the "old school" views and designer intent in time and can see both ways ...)

Now I understand more than in the past, and I think I would be able to "nail down" the cause and circumstances more effectively.
The night is young. How else can I annoy the world before sunsrise? 8) Inquisitive minds want to know ! And if they don't -- well like that ever has stopped me before ..
Spike
Posts: 2914
Joined: Fri Nov 05, 2004 3:12 am
Location: UK
Contact:

Post by Spike »

GL_RGB is good because its emulated. :P
Back when quake was young, GL_RGB worked on cards where GL_RGBA did not.
mh
Posts: 2292
Joined: Sat Jan 12, 2008 1:38 am

Post by mh »

Spike wrote:GL_RGB is good because its emulated. :P
Back when quake was young, GL_RGB worked on cards where GL_RGBA did not.
That I suppose is a fair point, but it is a case of compromising performance on cards that were released this century in favour of retaining support for one person's mouldy old PowerVR. (Doesn't explain the existence of gl_alpha_format in the codebase either...)

No, there are other factors at work here. Back when Quake was young there really was no such thing as a compliant OpenGL driver so the codebase was full of hacks to work around specific vendor problems. GLQuake would have also been one of the first mass-market/consumer-oriented OpenGL applications, and it was definitely ID software's first such (they had internal tools that ran on workstation cards for sure, but nothing released to the general public). People were still learning at the time.

OpenGL was also young. GLQuake was written to target OpenGL 1.0, and the OpenGL 1.1 features (texture objects! vertex arrays!) were new, dangerous, edgy and exciting. The 3DFX mini GL came after GLQuake, not before it.

One mustn't assume therefore that anything done in GLQuake is in anyway the correct way to do it, or even the most compatible way to do it.

GL_RGB is I suppose "good" in much the same way that beating a woolly mammoth over the head with a club to get meat for your dinner is "good". It worked for someone back in ancient prehistory but there's no way you'd want to do it today.
We had the power, we had the space, we had a sense of time and place
We knew the words, we knew the score, we knew what we were fighting for
metlslime
Posts: 316
Joined: Tue Feb 05, 2008 11:03 pm

Post by metlslime »

mh wrote:Interestingly, with OpenGL this seems to be the case irrespective of whether you use glBegin/glEnd or vertex arrays so the driver must be doing some behind-the-scenes optimization of it's own.
What I've absorbed from talking to LordHavoc and from reading Brian Hook and others' writings about the Quake 3 renderer, the "ideal" way to do rendering in opengl (circa 1999-2000 at least) is to just spit out large batches of GL_TRIANGLES, in triangle strip order where possible, and the driver would optimize that quite a bit. So theoretically, the duplicated verts aren't much of a burden compared to the benefit of very few draw calls.

I never got around to implementing it in fitzquake (0.85 was largely concerned with user requested features like alpha, interpolation, and higher map/protocol limits) but it's on my long-term list of things to do.
metlslime
Posts: 316
Joined: Tue Feb 05, 2008 11:03 pm

Post by metlslime »

Actually another thing i wanted to try: When i was working on Gods and Heroes (a cancelled MMO,) i noticed that the characters were rendered as one long triangle strip. The way they accomplished this was linking each unconnected strip to the previous one with one or two long, infinitely-thin triangles. So with wireframe mode on, characters looked like they were covered in cobwebs.
Spike
Posts: 2914
Joined: Fri Nov 05, 2004 3:12 am
Location: UK
Contact:

Post by Spike »

its generally much easier to use index arrays as well as vertex arrays, in which case duplicate verticies aren't actually duplicates.
from what I remember, nvidia gpus have a cache of 15 verts. Reuse the same index and it just grabs it from its cache.

Using 'fake' triangles in order to draw a model in a single strip is advantageous because it means you can draw the model with a single draw command. Batching is king of all else.
Whether its faster to use a triangle strip or to just define individual triangles I'm not entirely sure.
But either way, the number of glDraw* calls is kept low (this applies to d3d too, and more-so). Less draw calls is always good.
mh
Posts: 2292
Joined: Sat Jan 12, 2008 1:38 am

Post by mh »

You don't even need to use a strip, just great big soups of GL_TRIANGLES is enough (and will likely get you bigger batch sizes and with less fiddly stuff than using strips. No need for degenerate triangles - yayyyy!)

Triangle strip order is fine, best case is that each triangle will reuse 2 verts from the previous one, and you'll get better vertex cache efficiency. You can model that with GL_TRIANGLES and indexes, like Spike says. For the likes of Quake you probably won't really notice it though (I got maybe 1 FPS from putting on MDLs), and the setup and ordering cost might be prohibitive.

Saving vertex bandwidth in Quake doesn't have much return on investment most of the time. You can just convert your GL_POLYGONs (or fans or strips) to GL_TRIANGLES and indexes at run time, add them to a list, and render that list when it fills up or when state changes. You'll also get a single rendering path for everything on-screen this way, the code will be simpler, and easier to maintain, debug and enhance.
We had the power, we had the space, we had a sense of time and place
We knew the words, we knew the score, we knew what we were fighting for
Post Reply