Fast Dynamic Lighting

Discuss programming topics for the various GPL'd game engine sources.
szo
Posts: 132
Joined: Mon Dec 06, 2010 4:42 pm

Re: Fast Dynamic Lighting

Post by szo »

mh wrote:Just resurrecting this old one regarding the endianness issue mentioned by szo.

The GL spec (page 97, 1.2.1 version) clearly states which bits are assigned to which components for the UNSIGNED_INT_8_8_8_8_REV type so endianness is basically not an issue - 4th component goes in bits 31-24, 3rd in 23-16, 2nd in 15-8 and 1st in 7-0 and if an implementation does otherwise then it's non-conformant.

Where it would be an issue is if you used unsigned int * for your source data type, but because Tex(Sub)Image takes a GLvoid * parameter for data you can still use byte * even with this type and satisfy all requirements.
Thanks for the additional info.
taniwha
Posts: 401
Joined: Thu Jan 14, 2010 7:11 am
Contact:

Re: Fast Dynamic Lighting

Post by taniwha »

I would like to add "be sure to call glTexSubImage2D only once per frame"* to the discussion. In my glsl looks-like-sw renderer, I was calling glTSI once per updated surface per frame (I have one giant (2kx2k) lightmap texture). I did some work to batch the updates into one big dirty-rectangle, and even though I update the whole row rather than just the sub-row, I got a 70% speedup on timedemo bigass1 (94->159). My code was already separating lightmap updates and poly rendering, so the slowdown was all in the excessive calls to glTSI.

* Per texture, of course, but if you're updating many textures using glTSI, maybe it's time to rethink your design.
Leave others their otherness.
http://quakeforge.net/
mh
Posts: 2292
Joined: Sat Jan 12, 2008 1:38 am

Re: Fast Dynamic Lighting

Post by mh »

Can't second that enough - it's the big performance killer; BGRA/etc is just icing on the cake (but very flavoursome icing in the case of Intels). Each TexSubImage call can incur a pipeline stall (it's effectively equivalent to a glFinish call in this respect) so calling it once per surf will kill you. This wasn't (as much of) an issue on older hardware (otherwise neither Quake nor Q2 would have done it) owing to a much shallower pipeline (and possibly 3dfx pulled some tricks in their mini-driver to work around it too - saving out the calls and issuing them once only at the end of a frame would be one thing they might have done); on modern hardware when you get a stall everything must wait until the GPU and CPU sync up. Traditional Quake lightmap updating is very close to the worst possible thing you can do in every respect if you're concerned about performance - hundreds or thousands of stalls per frame, non-native formats, it's just bad piled upon bad.

I've experimented some with the single giant texture concept too (but dividing it into 128*128 tiles) - performance wise it was roughly neck and neck with my current method which uses a texture array. I also update per entity rather than once only, because I consider adding dynamic lights to BSP models to be important enough to incur the overhead of this (although I'm looking into ways to get that part going faster). There's obviously a tradeoff between updating the full width of the rect (== more data to send) vs multiple updates (== more stall potential) but I'm reasonably happy enough with 850 fps bigass1.
We had the power, we had the space, we had a sense of time and place
We knew the words, we knew the score, we knew what we were fighting for
taniwha
Posts: 401
Joined: Thu Jan 14, 2010 7:11 am
Contact:

Re: Fast Dynamic Lighting

Post by taniwha »

I think my current slowdown is actually my 2d code. I enable/disable the shader and parameters every icon :/ I disabled icons entirely and got another ~30fps (~20%).

QF's glsl is a learning exercise, after all :) (I knew only the basics of GL before I started on it, forget about glsl, gl optimization, etc)
Leave others their otherness.
http://quakeforge.net/
mh
Posts: 2292
Joined: Sat Jan 12, 2008 1:38 am

Re: Fast Dynamic Lighting

Post by mh »

2D can actually be a surprisingly huge slowdown - it's quite fillrate and overdraw intensive and batching up calls can help a lot. I remember getting a fright when Draw_Character turned out to be my biggest bottleneck on a VMWare test machine - I had to start thinking of batching calls after that. Way I handle it is by using a common input/vertex layout for everything (position/colour/texcoords, even if not needed - helps keep buffer switches down and instancing gets the vertex size down to under half that needed for a full quad, although in practice that's not a bottleneck - I just wanted to experiment with instancing), sniffing for state changes and issuing the current batch if one happens (these are generally only texture (the scrap system helps a lot here) and shader, although I've also got the ability to set a new ortho matrix if needed, and D3D's separation of sampling parameters from the texture object means that I also need to watch out for some textures that need to clamp, some that need to wrap, etc), otherwise just adding the specified quad to a vertex buffer. I've a nice state filtering system that can take a callback to be executed before state changes, so I shove my Draw_Flush into that and everything happens automatically. One final flush of anything left over at the end of the frame and it's done.

Another surprisingly big bottleneck is drawing the gun model. Right now I've got assumptions in my code that it's going to be the last thing drawn in a frame (big mistake that) but as soon as I work them out of it I'm going to try moving it to first and see if the GPU's early-Z can help any (it should be able to quickly reject a lot of world and other polys before the PS runs). Shouldn't be an issue - Q2 just draws it mixed in with the regular ents and it works fine there.

Bleagh - verbal diarrhea. :lol:
We had the power, we had the space, we had a sense of time and place
We knew the words, we knew the score, we knew what we were fighting for
taniwha
Posts: 401
Joined: Thu Jan 14, 2010 7:11 am
Contact:

Re: Fast Dynamic Lighting

Post by taniwha »

Yeah, rewriting the 2d code got me another 9% :)

The rewrite involved using a scrap for all 2d (except conback), batching, and sharing vertex info/queues between text and icons.
Leave others their otherness.
http://quakeforge.net/
mh
Posts: 2292
Joined: Sat Jan 12, 2008 1:38 am

Re: Fast Dynamic Lighting

Post by mh »

Watch out for backtile - that needs to go outside of the scrap too cos it wraps (unless you're doing Custom Shader Funky Magic).
We had the power, we had the space, we had a sense of time and place
We knew the words, we knew the score, we knew what we were fighting for
taniwha
Posts: 401
Joined: Thu Jan 14, 2010 7:11 am
Contact:

Re: Fast Dynamic Lighting

Post by taniwha »

Yeah, I'd half forgotten about it: commented it out with FIXME, but then forgot to fix it. I am considering the shader magic, though (for bsp textures).
Leave others their otherness.
http://quakeforge.net/
mh
Posts: 2292
Joined: Sat Jan 12, 2008 1:38 am

Re: Fast Dynamic Lighting

Post by mh »

Sounds interesting; let me know how you get on with that. :)

I have to say that I'm fascinated by your current work - doing a hardware accelerated version of software Quake's surface caching system always seemed a neat idea to me (I believe that the old VQuake did it but it didn't work out well there; more modern capabilities should suit it much better).
We had the power, we had the space, we had a sense of time and place
We knew the words, we knew the score, we knew what we were fighting for
taniwha
Posts: 401
Joined: Thu Jan 14, 2010 7:11 am
Contact:

Re: Fast Dynamic Lighting

Post by taniwha »

Heh, thanks, though... what exactly is surface caching? I'm not sure I ever understood that part of the sw renderer, and I certainly didn't set out to re-implement that part (as such). Are you referring to the loading of the entire bsp set into a vbo?
Leave others their otherness.
http://quakeforge.net/
frag.machine
Posts: 2126
Joined: Sat Nov 25, 2006 1:49 pm

Re: Fast Dynamic Lighting

Post by frag.machine »

taniwha wrote:Heh, thanks, though... what exactly is surface caching? I'm not sure I ever understood that part of the sw renderer, and I certainly didn't set out to re-implement that part (as such). Are you referring to the loading of the entire bsp set into a vbo?
Maybe I'm not the best person to explain, but I'll try anyway: a surface is the product between the texture and the corresponding lightmap for a given polygon. In order to speed up the rendering, SW Quake used a big cache of previously calculated surfaces. That's why dynamic lights used to be a performance hit back then BTW: it forces the corresponding cache entry to be discarded and recalculated.

More from Abrash himself:

http://www.bluesnews.com/abrash/chap68.shtml

EDIT: to correct a typo.
I know FrikaC made a cgi-bin version of the quakec interpreter once and wrote part of his website in QuakeC :) (LordHavoc)
leileilol
Posts: 2783
Joined: Fri Oct 15, 2004 3:23 am

Re: Fast Dynamic Lighting

Post by leileilol »

Quake pretty much was a megatexture in software.
i should not be here
taniwha
Posts: 401
Joined: Thu Jan 14, 2010 7:11 am
Contact:

Re: Fast Dynamic Lighting

Post by taniwha »

frag.machine: thank you for the link, it was very informative, and rather nostalgic: I remember reading all of Abrash's Quake articles in Dr Dobb's as they came out. Unfortunately, I gave my Dr Dobb's collection to the library in Wellington :(
Leave others their otherness.
http://quakeforge.net/
mh
Posts: 2292
Joined: Sat Jan 12, 2008 1:38 am

Re: Fast Dynamic Lighting

Post by mh »

leileilol wrote:Quake pretty much was a megatexture in software.
This, more or less.

It was stored in memory rather than streamed from disk too, but otherwise megatexture's direct ancestor was software Quake.
We had the power, we had the space, we had a sense of time and place
We knew the words, we knew the score, we knew what we were fighting for
taniwha
Posts: 401
Joined: Thu Jan 14, 2010 7:11 am
Contact:

Re: Fast Dynamic Lighting

Post by taniwha »

Btw, for those interested: my work on Monday and Tuesday produced these results for QF's glsl renderer (nouveau on nv50, 1680x1050):
  • bigass1: 92.8->189.3
  • demo1: 163.2->233
  • demo2: 173->217.1
  • demo3: 96.5->204.5
  • overkill: 59.8->157.3
(qw and nq binaries are still separate, but use the same renderer code)

There's still room for improvement because beams are mismanaged and often wind up with 2+ beams from the one LG, and the bsp mvp matrix gets uploaded for every used texture. However, the bulk of the improvement (92-~160 for bigass1) came from fixing the lightmap upload code.

One very important thing I discovered today because my quake window is the size of my desktop but I wasn't runing fullscreen: if the GL window is slightly off-screen, timedemo will lose about 30fps. I was getting 157 for overkill yesterday, but 127 today. When I checked bigass1 and it was the same, I noticed nq was fullscreen but qw was not: ran qw again fullscreen, and overkill went back to 157fps.
Leave others their otherness.
http://quakeforge.net/
Post Reply