Doom 3 engine release and game code
Moderator: InsideQC Admins
Re: Doom 3 engine release and game code
mh wrote:What the glMapBufferRange stuff does is allow you to take advantage of a VBO streaming pattern that D3D has enjoyed since at least version 7 - in D3D terms it's known as the discard/no-overwrite pattern.
A VBO is a GPU resource, and normally, if you try to update a GPU resource that is currently in use for drawing with (entirely possible because of the asynchronous nature of CPU/GPU operation), everything must stall and wait for drawing to complete before the update can happen. The stock Doom 3 code actually double-buffers it's streaming VBOs to try avoid this (in a slightly obfuscated way) but glMapBufferRange is a more robust way.
So, I mentioned discard/no-overwrite above. Here's what they do.
The buffer is filled in a linear manner. You've got 2mb (or whatever) of space, vertexes are added beginning at position 0, as new vertexes are added they get appended until the buffer fills, then magic happens.
This standard update is no-overwrite; your code makes a promise to GL that it's not going to overwrite any region of the buffer that may be currently in use for drawing, and in return GL will let you update the buffer without blocking. In order to be able to keep this promise your code must maintain a counter indicating how much space in the buffer it has previously used, and add new verts to the buffer at this counter position.
When the buffer becomes full you "discard". This doesn't throw away anything previously added, instead GL will keep the previous block of buffer memory around for as long as is needed to satisfy any pending draw calls, but will give you a new, fresh block for any further updates. That's the "magic" I mentioned above, and it's what lets you use a streaming VBO without any blocking.
This pattern will also let you get rid of Doom 3's double buffering, thus saving you some GPU memory (I haven't yet done this in my code). Because there's no more blocking it will run faster in cases where there is a lot of dynamic buffer usage, but because Doom 3 locks at 60fps it may not be as directly measurable as if the engine was unlocked. Hence the "it feels more responsive but I can't quite put my finger on it" result.
There's another chunk of code in the standard Alloc call which deals with updates of non-streaming VBOs and which is implemented in quite an evil manner by the stock Doom 3 code. When updating such a VBO you can get a faster update if the glBufferData params are the same as was previously used for that VBO (the driver can just reuse the previous block of buffer memory instead of needing to fully reallocate). Doom 3 doesn't do that, so it doesn't get these faster updates, but by searching the free static headers list for a VBO that matches and using that instead of just taking the first one from it, it can. Obviously it sucks that you need to search the list in this way, and a better implementation would just store the VBO with the object that uses it, and reuse the same VBO each time. Since this mainly happens with model animations an ever better implementation would use transform feedback to animate the model instead of animating it on the CPU and needing to re-upload verts each frame, but I haven't even looked at that yet.
So all in all the stock VBO implementation is an unholy mess that needs serious work to get it functioning right, much the same way as Quake 1 lightmap updates were a mess. That code just represents the start of a process, but I personally don't think it's worth continuing with. I say - wait for the BFG edition, wait and see if that's going to get a source release (Carmack seems keen), and use that as a base for further work instead - chances are that all of this stuff will be fixed in that.
Out of curiosity, have you tested your changes via the Timedemo feature?
The FPS is unbounded there so you should be able to see the actual engine improvements better.
I have been urging the TDM team to test out your VBO code. TDM missions are much more demanding than vanilla Doom 3
due to higher poly environs, models and more light sources (and dynamic lights).
- nbohr1more
- Posts: 54
- Joined: Fri Dec 09, 2011 7:04 am
Re: Doom 3 engine release and game code
nbohr1more wrote:Out of curiosity, have you tested your changes via the Timedemo feature?
Not in Doom 3 because I've made other changes that would invalidate the result. I could I suppose port them to vanilla.
I have tested the glMapBufferRange stuff in other programs; in my test Quake 2 engine I can easily set the particle system up for a head-to-head test between updating VBOs the way Doom 3 does it:
- Code: Select all
glNamedBufferDataEXT (gl_particlevbo, r_newrefdef.num_particles * sizeof (particle_t), r_newrefdef.particles, GL_STREAM_DRAW);
GL_UseProgramWithUBOs (gl_particleprog, &ubodef, 1);
GL_Enable ((BLEND_BIT | DEPTHTEST_BIT) | (gl_cull->value ? CULLFACE_BIT : 0));
GL_BindVertexArray (gl_particlevao);
glDrawArraysInstancedBaseInstance (GL_TRIANGLE_FAN, 0, 4, r_newrefdef.num_particles, 0);
And using glMapBufferRange:
- Code: Select all
if (r_firstparticle + r_newrefdef.num_particles >= MAX_PARTICLES)
{
glNamedBufferDataEXT (gl_particlevbo, MAX_PARTICLES * sizeof (particle_t), NULL, GL_STREAM_DRAW);
r_firstparticle = 0;
}
offset = r_firstparticle * sizeof (particle_t);
size = r_newrefdef.num_particles * sizeof (particle_t);
if ((dst = glMapNamedBufferRangeEXT (gl_particlevbo, offset, size, BUFFER_NO_OVERWRITE)) != NULL)
{
memcpy (dst, r_newrefdef.particles, size);
glUnmapNamedBufferEXT (gl_particlevbo);
GL_UseProgramWithUBOs (gl_particleprog, &ubodef, 1);
GL_Enable ((BLEND_BIT | DEPTHTEST_BIT) | (gl_cull->value ? CULLFACE_BIT : 0));
GL_BindVertexArray (gl_particlevao);
glDrawArraysInstancedBaseInstance (GL_TRIANGLE_FAN, 0, 4, r_newrefdef.num_particles, r_firstparticle);
r_firstparticle += r_newrefdef.num_particles;
}
With everything else being equal the results should be expected to reflect a performance difference isolated to just the glMapBufferRange calls. Bear in mind that this is just for the particle system and the particle system alone; everything else is the very same between both tests. Quake 2 is also sufficiently lightweight that we can be certain that no other work is masking the difference, and I've used state filtering and fast-path calls everywhere to ensure that we're definitely homing in on a very specific difference.
So, using the Doom 3-style update method we get 522fps; with glMapBufferRange it's 572, consistent across multiple timedemos.
Doom 3 is going to be different of course as it has heavier bottlenecks elsewhere throughout the engine, but otherwise this one is definitely in happy-land.
We had the power, we had the space, we had a sense of time and place
We knew the words, we knew the score, we knew what we were fighting for
We knew the words, we knew the score, we knew what we were fighting for
-

mh - Posts: 2292
- Joined: Sat Jan 12, 2008 1:38 am
Re: Doom 3 engine release and game code
WOW
Thanks for the prompt reply.
I'll try to stir the pot over there some more. The (remaining) coding team has been nervous to touch renderer code save one developer who's been on hiatus awhile. He had apparently made a functional LATC parser and RGTC parser but hasn't patched them in yet because he wants to also include an on-the-fly converter to those formats. I was really hoping for high quality normal map compression in the next release but maybe this will sorta make up for that. (Load times will still suck though
)
Thanks for the prompt reply.
I'll try to stir the pot over there some more. The (remaining) coding team has been nervous to touch renderer code save one developer who's been on hiatus awhile. He had apparently made a functional LATC parser and RGTC parser but hasn't patched them in yet because he wants to also include an on-the-fly converter to those formats. I was really hoping for high quality normal map compression in the next release but maybe this will sorta make up for that. (Load times will still suck though
- nbohr1more
- Posts: 54
- Joined: Fri Dec 09, 2011 7:04 am
Re: Doom 3 engine release and game code
The CPU bottleneck is starting to be a real problem with higher quality assets :S
Shows the need for moving more functions gfx side but its a huge undertaking.
Shows the need for moving more functions gfx side but its a huge undertaking.
Productivity is a state of mind.
-

revelator - Posts: 2567
- Joined: Thu Jan 24, 2008 12:04 pm
- Location: inside tha debugger
Re: Doom 3 engine release and game code
https://twitter.com/ID_AA_Carmack
Happy days.
Got approval for GPL release of Doom 3 BFG code (minus third party bits)! @idBrianHarris has already done most of the work.
Happy days.
We had the power, we had the space, we had a sense of time and place
We knew the words, we knew the score, we knew what we were fighting for
We knew the words, we knew the score, we knew what we were fighting for
-

mh - Posts: 2292
- Joined: Sat Jan 12, 2008 1:38 am
Re: Doom 3 engine release and game code
mh wrote:https://twitter.com/ID_AA_CarmackGot approval for GPL release of Doom 3 BFG code (minus third party bits)! @idBrianHarris has already done most of the work.
Happy days.
From what I read this BFG Edition actually uses portions of idtech5 (Rage) hence the lack of compatibility with regular Doom 3 mods. It will be quite interesting to peek on this code...
I know FrikaC made a cgi-bin version of the quakec interpreter once and wrote part of his website in QuakeC
(LordHavoc)
-

frag.machine - Posts: 2090
- Joined: Sat Nov 25, 2006 1:49 pm
Re: Doom 3 engine release and game code
yup it uses rage format but those parts can be skipped
still a lot of idtech4 in there but im interrested in seing what stuff they changed to better match todays PC's
Productivity is a state of mind.
-

revelator - Posts: 2567
- Joined: Thu Jan 24, 2008 12:04 pm
- Location: inside tha debugger
Re: Doom 3 engine release and game code
http://codeflow.org/entries/2010/nov/07/opengl-4-tessellation/
Interresting example of opengl hardware tesselation
sadly it needs a fully working glsl backend but if we get that far the example could even be used for hardware generated terrains.
Not sure if it could be done with the old ARB2 backend hmm ? might also need support engine side.
Interresting example of opengl hardware tesselation
Not sure if it could be done with the old ARB2 backend hmm ? might also need support engine side.
Productivity is a state of mind.
-

revelator - Posts: 2567
- Joined: Thu Jan 24, 2008 12:04 pm
- Location: inside tha debugger
Re: Doom 3 engine release and game code
tessellation is overrated.
1: its faster to precompute it on the cpu than to spend gpu resources on generating new polygons (remember that modern gpus have a unified shader archetecture - time spent on a tessellation shader is time no longer spent on the vertex/fragment shader). Precompute it on an alternate cpu core instead for extra points.
2: knowing the exact position of the verticies is required anyway for things like decals. guarenteed no swimming, and physics impacts the actual geometry.
3: frustum culling is much easier using a divide-and-conquer algo on the cpu, instead of checking each vertex to see if its onscreen. if the gpu never sees it, the gpu will never waste time on it.
4: a 4096*4096 heightmap is huge. if you have that big a heightmap, your texture quality upon that heightmap is going to be rather lackluster.
5: if you're doing any shadows or anything that requires drawing the terrain multiple times a frame, doing it on the cpu means its only done once.
6: cpu is more efficient/better at branching.
imho.
for a flight sim with a huge view distance a geometry shader might be simpler, but in that case you're probably going to want more than 4096 sample points in each axis.
1: its faster to precompute it on the cpu than to spend gpu resources on generating new polygons (remember that modern gpus have a unified shader archetecture - time spent on a tessellation shader is time no longer spent on the vertex/fragment shader). Precompute it on an alternate cpu core instead for extra points.
2: knowing the exact position of the verticies is required anyway for things like decals. guarenteed no swimming, and physics impacts the actual geometry.
3: frustum culling is much easier using a divide-and-conquer algo on the cpu, instead of checking each vertex to see if its onscreen. if the gpu never sees it, the gpu will never waste time on it.
4: a 4096*4096 heightmap is huge. if you have that big a heightmap, your texture quality upon that heightmap is going to be rather lackluster.
5: if you're doing any shadows or anything that requires drawing the terrain multiple times a frame, doing it on the cpu means its only done once.
6: cpu is more efficient/better at branching.
imho.
for a flight sim with a huge view distance a geometry shader might be simpler, but in that case you're probably going to want more than 4096 sample points in each axis.
- Spike
- Posts: 2892
- Joined: Fri Nov 05, 2004 3:12 am
- Location: UK
Re: Doom 3 engine release and game code
Just exploring possbilities
it might not be faster as you point out but it could have its uses in case the hit is not to big.
Though im a bit puzzled about modern day CPU's being god awfull at stuff that the previous generations handled pretty well back when a lot of that stuff actually had to be done CPU side hmm ???.
One appaling example is that my old p4 actually runs doom3 better than my current i7
maybe related to something i heard carmack talking about that doom3 is actually multithreaded but that the code newer worked optimally ?. I have a weird feeling im onto something as the problem is also there with Q4 latest patch where you can turn of multithreading and when i do the game actually runs far far better.
Makes one wonder if theres a hint from PVS-studio when it warns about replacing the thread functions with beginThreadEx endThreadEx calls.
Though im a bit puzzled about modern day CPU's being god awfull at stuff that the previous generations handled pretty well back when a lot of that stuff actually had to be done CPU side hmm ???.
One appaling example is that my old p4 actually runs doom3 better than my current i7
Makes one wonder if theres a hint from PVS-studio when it warns about replacing the thread functions with beginThreadEx endThreadEx calls.
Productivity is a state of mind.
-

revelator - Posts: 2567
- Joined: Thu Jan 24, 2008 12:04 pm
- Location: inside tha debugger
Re: Doom 3 engine release and game code
If you're tesselating dynamically then you'd better do it on the GPU otherwise buffer uploads and GPU latency/synchronization are going to kill you. A point often forgotten, probably because it's not directly measurable in amount of code you write yourself (and because it looks like less code and it's nice safe CPU code rather than scary GPU code so maybe there's a comfort zone attached to it). Likewise if you're tesselating very fine then doing the extra work may be a better balance than spending so much more storage. The only rule is that there is no hard-and-fast rule and you need to adapt to your own use case.
Geometry shaders are definitely overrated; just having the GS stage enabled (even just for pass-through) will burn maybe 10% of your performance so you need to be certain that the gain you're getting back will exceed that. Even the simple classic case of expanding points to quads for a particle system is slower than sending all 4 verts (and if you use instancing you only need to send one vert per particle anyway; even then doing the billboarding calculations 4 times per particle in a VS is faster than doing them once per particle in a GS). What they are good for is generating new per-vertex data on the fly (such as normals for a triangle) but as a general rule using them to add extra verts/tris is a performance killer.
Geometry shaders are definitely overrated; just having the GS stage enabled (even just for pass-through) will burn maybe 10% of your performance so you need to be certain that the gain you're getting back will exceed that. Even the simple classic case of expanding points to quads for a particle system is slower than sending all 4 verts (and if you use instancing you only need to send one vert per particle anyway; even then doing the billboarding calculations 4 times per particle in a VS is faster than doing them once per particle in a GS). What they are good for is generating new per-vertex data on the fly (such as normals for a triangle) but as a general rule using them to add extra verts/tris is a performance killer.
We had the power, we had the space, we had a sense of time and place
We knew the words, we knew the score, we knew what we were fighting for
We knew the words, we knew the score, we knew what we were fighting for
-

mh - Posts: 2292
- Joined: Sat Jan 12, 2008 1:38 am
Re: Doom 3 engine release and game code
Incidentally, here's what a frame of BFG Edition looks like: http://pastebin.com/pxKnZhLD
We had the power, we had the space, we had a sense of time and place
We knew the words, we knew the score, we knew what we were fighting for
We knew the words, we knew the score, we knew what we were fighting for
-

mh - Posts: 2292
- Joined: Sat Jan 12, 2008 1:38 am
Re: Doom 3 engine release and game code
I did hear rumours that the BFG edition used GLSL seems they where right
well that bodes well for those who wanted a GLSL backend.
Productivity is a state of mind.
-

revelator - Posts: 2567
- Joined: Thu Jan 24, 2008 12:04 pm
- Location: inside tha debugger
Re: Doom 3 engine release and game code
I'd guess it uses Cg - Rage certainly did, the shaders (example: http://pastebin.com/6CukSzaj) look like auto-generated code (and similar in style to Rage's: http://pastebin.com/yJ644iPK) and it would make sense as they could compile it down to HLSL/etc for the 360/etc too.
That would also make a D3D port viable too.
That would also make a D3D port viable too.
We had the power, we had the space, we had a sense of time and place
We knew the words, we knew the score, we knew what we were fighting for
We knew the words, we knew the score, we knew what we were fighting for
-

mh - Posts: 2292
- Joined: Sat Jan 12, 2008 1:38 am
Who is online
Users browsing this forum: No registered users and 1 guest