Thanks for the additional info.mh wrote:Just resurrecting this old one regarding the endianness issue mentioned by szo.
The GL spec (page 97, 1.2.1 version) clearly states which bits are assigned to which components for the UNSIGNED_INT_8_8_8_8_REV type so endianness is basically not an issue - 4th component goes in bits 31-24, 3rd in 23-16, 2nd in 15-8 and 1st in 7-0 and if an implementation does otherwise then it's non-conformant.
Where it would be an issue is if you used unsigned int * for your source data type, but because Tex(Sub)Image takes a GLvoid * parameter for data you can still use byte * even with this type and satisfy all requirements.
Fast Dynamic Lighting
Re: Fast Dynamic Lighting
Re: Fast Dynamic Lighting
I would like to add "be sure to call glTexSubImage2D only once per frame"* to the discussion. In my glsl looks-like-sw renderer, I was calling glTSI once per updated surface per frame (I have one giant (2kx2k) lightmap texture). I did some work to batch the updates into one big dirty-rectangle, and even though I update the whole row rather than just the sub-row, I got a 70% speedup on timedemo bigass1 (94->159). My code was already separating lightmap updates and poly rendering, so the slowdown was all in the excessive calls to glTSI.
* Per texture, of course, but if you're updating many textures using glTSI, maybe it's time to rethink your design.
* Per texture, of course, but if you're updating many textures using glTSI, maybe it's time to rethink your design.
Leave others their otherness.
http://quakeforge.net/
http://quakeforge.net/
Re: Fast Dynamic Lighting
Can't second that enough - it's the big performance killer; BGRA/etc is just icing on the cake (but very flavoursome icing in the case of Intels). Each TexSubImage call can incur a pipeline stall (it's effectively equivalent to a glFinish call in this respect) so calling it once per surf will kill you. This wasn't (as much of) an issue on older hardware (otherwise neither Quake nor Q2 would have done it) owing to a much shallower pipeline (and possibly 3dfx pulled some tricks in their mini-driver to work around it too - saving out the calls and issuing them once only at the end of a frame would be one thing they might have done); on modern hardware when you get a stall everything must wait until the GPU and CPU sync up. Traditional Quake lightmap updating is very close to the worst possible thing you can do in every respect if you're concerned about performance - hundreds or thousands of stalls per frame, non-native formats, it's just bad piled upon bad.
I've experimented some with the single giant texture concept too (but dividing it into 128*128 tiles) - performance wise it was roughly neck and neck with my current method which uses a texture array. I also update per entity rather than once only, because I consider adding dynamic lights to BSP models to be important enough to incur the overhead of this (although I'm looking into ways to get that part going faster). There's obviously a tradeoff between updating the full width of the rect (== more data to send) vs multiple updates (== more stall potential) but I'm reasonably happy enough with 850 fps bigass1.
I've experimented some with the single giant texture concept too (but dividing it into 128*128 tiles) - performance wise it was roughly neck and neck with my current method which uses a texture array. I also update per entity rather than once only, because I consider adding dynamic lights to BSP models to be important enough to incur the overhead of this (although I'm looking into ways to get that part going faster). There's obviously a tradeoff between updating the full width of the rect (== more data to send) vs multiple updates (== more stall potential) but I'm reasonably happy enough with 850 fps bigass1.
We had the power, we had the space, we had a sense of time and place
We knew the words, we knew the score, we knew what we were fighting for
We knew the words, we knew the score, we knew what we were fighting for
Re: Fast Dynamic Lighting
I think my current slowdown is actually my 2d code. I enable/disable the shader and parameters every icon :/ I disabled icons entirely and got another ~30fps (~20%).
QF's glsl is a learning exercise, after all (I knew only the basics of GL before I started on it, forget about glsl, gl optimization, etc)
QF's glsl is a learning exercise, after all (I knew only the basics of GL before I started on it, forget about glsl, gl optimization, etc)
Leave others their otherness.
http://quakeforge.net/
http://quakeforge.net/
Re: Fast Dynamic Lighting
2D can actually be a surprisingly huge slowdown - it's quite fillrate and overdraw intensive and batching up calls can help a lot. I remember getting a fright when Draw_Character turned out to be my biggest bottleneck on a VMWare test machine - I had to start thinking of batching calls after that. Way I handle it is by using a common input/vertex layout for everything (position/colour/texcoords, even if not needed - helps keep buffer switches down and instancing gets the vertex size down to under half that needed for a full quad, although in practice that's not a bottleneck - I just wanted to experiment with instancing), sniffing for state changes and issuing the current batch if one happens (these are generally only texture (the scrap system helps a lot here) and shader, although I've also got the ability to set a new ortho matrix if needed, and D3D's separation of sampling parameters from the texture object means that I also need to watch out for some textures that need to clamp, some that need to wrap, etc), otherwise just adding the specified quad to a vertex buffer. I've a nice state filtering system that can take a callback to be executed before state changes, so I shove my Draw_Flush into that and everything happens automatically. One final flush of anything left over at the end of the frame and it's done.
Another surprisingly big bottleneck is drawing the gun model. Right now I've got assumptions in my code that it's going to be the last thing drawn in a frame (big mistake that) but as soon as I work them out of it I'm going to try moving it to first and see if the GPU's early-Z can help any (it should be able to quickly reject a lot of world and other polys before the PS runs). Shouldn't be an issue - Q2 just draws it mixed in with the regular ents and it works fine there.
Bleagh - verbal diarrhea.
Another surprisingly big bottleneck is drawing the gun model. Right now I've got assumptions in my code that it's going to be the last thing drawn in a frame (big mistake that) but as soon as I work them out of it I'm going to try moving it to first and see if the GPU's early-Z can help any (it should be able to quickly reject a lot of world and other polys before the PS runs). Shouldn't be an issue - Q2 just draws it mixed in with the regular ents and it works fine there.
Bleagh - verbal diarrhea.
We had the power, we had the space, we had a sense of time and place
We knew the words, we knew the score, we knew what we were fighting for
We knew the words, we knew the score, we knew what we were fighting for
Re: Fast Dynamic Lighting
Yeah, rewriting the 2d code got me another 9%
The rewrite involved using a scrap for all 2d (except conback), batching, and sharing vertex info/queues between text and icons.
The rewrite involved using a scrap for all 2d (except conback), batching, and sharing vertex info/queues between text and icons.
Leave others their otherness.
http://quakeforge.net/
http://quakeforge.net/
Re: Fast Dynamic Lighting
Watch out for backtile - that needs to go outside of the scrap too cos it wraps (unless you're doing Custom Shader Funky Magic).
We had the power, we had the space, we had a sense of time and place
We knew the words, we knew the score, we knew what we were fighting for
We knew the words, we knew the score, we knew what we were fighting for
Re: Fast Dynamic Lighting
Yeah, I'd half forgotten about it: commented it out with FIXME, but then forgot to fix it. I am considering the shader magic, though (for bsp textures).
Leave others their otherness.
http://quakeforge.net/
http://quakeforge.net/
Re: Fast Dynamic Lighting
Sounds interesting; let me know how you get on with that.
I have to say that I'm fascinated by your current work - doing a hardware accelerated version of software Quake's surface caching system always seemed a neat idea to me (I believe that the old VQuake did it but it didn't work out well there; more modern capabilities should suit it much better).
I have to say that I'm fascinated by your current work - doing a hardware accelerated version of software Quake's surface caching system always seemed a neat idea to me (I believe that the old VQuake did it but it didn't work out well there; more modern capabilities should suit it much better).
We had the power, we had the space, we had a sense of time and place
We knew the words, we knew the score, we knew what we were fighting for
We knew the words, we knew the score, we knew what we were fighting for
Re: Fast Dynamic Lighting
Heh, thanks, though... what exactly is surface caching? I'm not sure I ever understood that part of the sw renderer, and I certainly didn't set out to re-implement that part (as such). Are you referring to the loading of the entire bsp set into a vbo?
Leave others their otherness.
http://quakeforge.net/
http://quakeforge.net/
-
- Posts: 2126
- Joined: Sat Nov 25, 2006 1:49 pm
Re: Fast Dynamic Lighting
Maybe I'm not the best person to explain, but I'll try anyway: a surface is the product between the texture and the corresponding lightmap for a given polygon. In order to speed up the rendering, SW Quake used a big cache of previously calculated surfaces. That's why dynamic lights used to be a performance hit back then BTW: it forces the corresponding cache entry to be discarded and recalculated.taniwha wrote:Heh, thanks, though... what exactly is surface caching? I'm not sure I ever understood that part of the sw renderer, and I certainly didn't set out to re-implement that part (as such). Are you referring to the loading of the entire bsp set into a vbo?
More from Abrash himself:
http://www.bluesnews.com/abrash/chap68.shtml
EDIT: to correct a typo.
I know FrikaC made a cgi-bin version of the quakec interpreter once and wrote part of his website in QuakeC (LordHavoc)
Re: Fast Dynamic Lighting
frag.machine: thank you for the link, it was very informative, and rather nostalgic: I remember reading all of Abrash's Quake articles in Dr Dobb's as they came out. Unfortunately, I gave my Dr Dobb's collection to the library in Wellington
Leave others their otherness.
http://quakeforge.net/
http://quakeforge.net/
Re: Fast Dynamic Lighting
This, more or less.leileilol wrote:Quake pretty much was a megatexture in software.
It was stored in memory rather than streamed from disk too, but otherwise megatexture's direct ancestor was software Quake.
We had the power, we had the space, we had a sense of time and place
We knew the words, we knew the score, we knew what we were fighting for
We knew the words, we knew the score, we knew what we were fighting for
Re: Fast Dynamic Lighting
Btw, for those interested: my work on Monday and Tuesday produced these results for QF's glsl renderer (nouveau on nv50, 1680x1050):
There's still room for improvement because beams are mismanaged and often wind up with 2+ beams from the one LG, and the bsp mvp matrix gets uploaded for every used texture. However, the bulk of the improvement (92-~160 for bigass1) came from fixing the lightmap upload code.
One very important thing I discovered today because my quake window is the size of my desktop but I wasn't runing fullscreen: if the GL window is slightly off-screen, timedemo will lose about 30fps. I was getting 157 for overkill yesterday, but 127 today. When I checked bigass1 and it was the same, I noticed nq was fullscreen but qw was not: ran qw again fullscreen, and overkill went back to 157fps.
- bigass1: 92.8->189.3
- demo1: 163.2->233
- demo2: 173->217.1
- demo3: 96.5->204.5
- overkill: 59.8->157.3
There's still room for improvement because beams are mismanaged and often wind up with 2+ beams from the one LG, and the bsp mvp matrix gets uploaded for every used texture. However, the bulk of the improvement (92-~160 for bigass1) came from fixing the lightmap upload code.
One very important thing I discovered today because my quake window is the size of my desktop but I wasn't runing fullscreen: if the GL window is slightly off-screen, timedemo will lose about 30fps. I was getting 157 for overkill yesterday, but 127 today. When I checked bigass1 and it was the same, I noticed nq was fullscreen but qw was not: ran qw again fullscreen, and overkill went back to 157fps.
Leave others their otherness.
http://quakeforge.net/
http://quakeforge.net/