Hardware Occlusion queries.
Moderator: InsideQC Admins
16 posts
• Page 1 of 2 • 1, 2
Hardware Occlusion queries.
Started toying a bit with those, atleast had some success.
above code was ported out of quake royale where it was used for oclluding lensflare and bloom.
It does work and is even resonably fast, but it does tend to get a little to effective on smaller screen space objects.
The word is that it is as it is, so to get to some of the benefits you need to use it in huge complex scenes.
- Code: Select all
qboolean GL_Occlusion(GLfloat width, GLfloat height)
{
float w = 640.0f / (float)width;
float h = 480.0f / (float)height;
float cornerFactor = 2.0f;
double corner1 = realtime*2;
double corner2 = realtime*3;
double corner3 = realtime*4;
double corner4 = realtime*5;
qboolean Occluded;
GLuint occQuery;
GLuint occSamples = 0;
GLuint occAvailable = 0;
if(GL_ExtensionBits & HAS_OCCLUSION)
{
// start up queries for occlusion.
glGenQueries(1, &occQuery);
// take down color and depthmask, we do not want to draw anything.
glColorMask(GL_FALSE, GL_FALSE, GL_FALSE, GL_FALSE);
glDepthMask(GL_FALSE);
// Load queries for Occlusion.
glBeginQuery(GL_SAMPLES_PASSED, occQuery);
// do a full scree quad to get the data it needs,
// do not render it, hence taking down color and depthmask above.
glBegin(GL_QUADS);
glVertex2f(-(w * 0.5f) + (sinf(corner1) * cornerFactor), -(h * 0.5f) + (cosf(corner1) * cornerFactor));
glVertex2f(-(w * 0.5f) + (sinf(corner2) * cornerFactor), (h * 0.5f) + (cosf(corner2) * cornerFactor));
glVertex2f((w * 0.5f) + (sinf(corner3) * cornerFactor), (h * 0.5f) + (cosf(corner3) * cornerFactor));
glVertex2f((w * 0.5f) + (sinf(corner4) * cornerFactor), -(h * 0.5f) + (cosf(corner4) * cornerFactor));
glEnd();
// Occlusion test done
glEndQuery(GL_SAMPLES_PASSED);
// flush queries
glFlush();
do
{
// Run queries until pipeline get's availiable.
glGetQueryObjectiv(occQuery, GL_QUERY_RESULT_AVAILABLE, &occAvailable);
} while(!occAvailable);
// go back to normal rendering.
glDepthMask(GL_TRUE);
glColorMask(GL_TRUE, GL_TRUE, GL_TRUE, GL_TRUE);
// fresh in from above,
// and tested in a way that does not stall the pipeline.
if (occAvailable > 0)
{
// Test again and output samples that passed.
glGetQueryObjectiv(occQuery, GL_QUERY_RESULT, &occSamples);
// get occlusion state (false = visible - true = occluded).
Occluded = (occSamples > 0) ? false : true;
}
}
else
{
// if we have an ancient card let thing's pass.
Occluded = false;
}
return Occluded;
}
above code was ported out of quake royale where it was used for oclluding lensflare and bloom.
It does work and is even resonably fast, but it does tend to get a little to effective on smaller screen space objects.
The word is that it is as it is, so to get to some of the benefits you need to use it in huge complex scenes.
Productivity is a state of mind.
-

revelator - Posts: 2567
- Joined: Thu Jan 24, 2008 12:04 pm
- Location: inside tha debugger
Re: Hardware Occlusion queries.
yeah... that's not how you're meant to do them.
firstly you're leaking occlusion query handles.
secondly, you're busylooping the cpu while the driver is busy feeding the gpu while the gpu is still idle (a general rule of thumb is that you should only check the result of occlusion queries on the _following_ frame - make it slightly larger/nearer so you won't get occasional flickering. at a minimum you should draw something else unrelated between the endquery and the getquery, to at least give the gpu/driver a chance to catch up with the cpu/app, yes this something else will not be detected by the occlusion query hence the whole next-frame thing).
drawing a fullscreen quad for your occlusion query is really quite pointless too of course...
GPUs are frikkin fast nowadays, so really think of occlusion queries as just an optimisation to reduce the cpu overhead sending lots of invisible drawcalls at the driver. if you're submitting one drawcall to avoid a single other (and probably needing to submit BOTH draw calls anyway), there had better be a GOOD reason for that...
They're useful for doorways so that you cull entire rooms, or for forward-rendered rtlights maybe, but totally pointless for your average quake mdl.
They may also be useful for cheat detection, but hey...
at least that's how I see them - as a cpu/gpu sync nightmare.
firstly you're leaking occlusion query handles.
secondly, you're busylooping the cpu while the driver is busy feeding the gpu while the gpu is still idle (a general rule of thumb is that you should only check the result of occlusion queries on the _following_ frame - make it slightly larger/nearer so you won't get occasional flickering. at a minimum you should draw something else unrelated between the endquery and the getquery, to at least give the gpu/driver a chance to catch up with the cpu/app, yes this something else will not be detected by the occlusion query hence the whole next-frame thing).
drawing a fullscreen quad for your occlusion query is really quite pointless too of course...
GPUs are frikkin fast nowadays, so really think of occlusion queries as just an optimisation to reduce the cpu overhead sending lots of invisible drawcalls at the driver. if you're submitting one drawcall to avoid a single other (and probably needing to submit BOTH draw calls anyway), there had better be a GOOD reason for that...
They're useful for doorways so that you cull entire rooms, or for forward-rendered rtlights maybe, but totally pointless for your average quake mdl.
They may also be useful for cheat detection, but hey...
at least that's how I see them - as a cpu/gpu sync nightmare.
- Spike
- Posts: 2892
- Joined: Fri Nov 05, 2004 3:12 am
- Location: UK
Re: Hardware Occlusion queries.
Refined it a bit in the meantime but the code itself seems to have originated from nvidias codesample.
this part glGetQueryObjectiv(occQuery, GL_QUERY_RESULT_AVAILABLE, &occAvailable);
is actually to avoid hogging the gpu as occAvaliable will only be true if the query was done, the way it was handled though is another matter the correct way according to all sources i can find is to just do it like this
glGetQueryObjectiv(occQuery, GL_QUERY_RESULT_AVAILABLE, &occAvailable);
if (occAvailable > 0)
glGetQueryObjectiv(occQuery, GL_QUERY_RESULT, &occSamples);
the last query wont run unless GL_QUERY_RESULT_AVAILABLE has spit out that the pipeline is ready to recieve data again.
And sure you dont have to use a fullscreen quad for testing on you can also use triangle mode or whatever
Im using it for bloom occlusion now and it seems to work rather well for that.
this part glGetQueryObjectiv(occQuery, GL_QUERY_RESULT_AVAILABLE, &occAvailable);
is actually to avoid hogging the gpu as occAvaliable will only be true if the query was done, the way it was handled though is another matter the correct way according to all sources i can find is to just do it like this
glGetQueryObjectiv(occQuery, GL_QUERY_RESULT_AVAILABLE, &occAvailable);
if (occAvailable > 0)
glGetQueryObjectiv(occQuery, GL_QUERY_RESULT, &occSamples);
the last query wont run unless GL_QUERY_RESULT_AVAILABLE has spit out that the pipeline is ready to recieve data again.
And sure you dont have to use a fullscreen quad for testing on you can also use triangle mode or whatever
Im using it for bloom occlusion now and it seems to work rather well for that.
Productivity is a state of mind.
-

revelator - Posts: 2567
- Joined: Thu Jan 24, 2008 12:04 pm
- Location: inside tha debugger
Re: Hardware Occlusion queries.
try to use conditional render
https://www.khronos.org/registry/OpenGL/extensions/NV/NV_conditional_render.txt
https://www.khronos.org/registry/OpenGL-Refpages/gl4/html/glBeginConditionalRender.xhtml (supported in gl 3.0+)
https://www.khronos.org/registry/OpenGL/extensions/NV/NV_conditional_render.txt
https://www.khronos.org/registry/OpenGL-Refpages/gl4/html/glBeginConditionalRender.xhtml (supported in gl 3.0+)
-

Barnes - Posts: 226
- Joined: Thu Dec 24, 2009 2:26 pm
- Location: Russia, Moscow
Re: Hardware Occlusion queries.
Allready did
works ok now, not noticing any slowdowns.
But spike is correct it works better on complex stuff.
Also it was mostly an experiment to see how well or not it works.
But spike is correct it works better on complex stuff.
Also it was mostly an experiment to see how well or not it works.
Productivity is a state of mind.
-

revelator - Posts: 2567
- Joined: Thu Jan 24, 2008 12:04 pm
- Location: inside tha debugger
Re: Hardware Occlusion queries.
With occlusion queries you're meant to issue the query, then come back a frame or two later and fetch the results, otherwise you've just broken CPU/GPU pipelining and you've done the equivalent of a great big glFinish call in the middle of your code.
If fetching the results immediately doesn't reduce your framerate to at least half what it was (and it should, even for just one query, even for a simple single quad) then your CPU/GPU pipelining is probably already broken elsewhere.
If fetching the results immediately doesn't reduce your framerate to at least half what it was (and it should, even for just one query, even for a simple single quad) then your CPU/GPU pipelining is probably already broken elsewhere.
We had the power, we had the space, we had a sense of time and place
We knew the words, we knew the score, we knew what we were fighting for
We knew the words, we knew the score, we knew what we were fighting for
-

mh - Posts: 2292
- Joined: Sat Jan 12, 2008 1:38 am
Re: Hardware Occlusion queries.
First version did indeed cause some slowdowns, probably it was also a bad idea to try and use it as a hardware version of R_CullBox.
New version uses a low poly version of the bbox to fill the queries for 3 frames before drawing the real deal.
The fullscreen quad part in version one was just copied of quake royale,
at the time i was not sure precisely how the queries worked, seems the original author was not either.
Its an interresting technique but software occlusion does the job better still so a bit pointless.
New version uses a low poly version of the bbox to fill the queries for 3 frames before drawing the real deal.
The fullscreen quad part in version one was just copied of quake royale,
at the time i was not sure precisely how the queries worked, seems the original author was not either.
Its an interresting technique but software occlusion does the job better still so a bit pointless.
Productivity is a state of mind.
-

revelator - Posts: 2567
- Joined: Thu Jan 24, 2008 12:04 pm
- Location: inside tha debugger
Re: Hardware Occlusion queries.
The occlusion quary is very strongly tied to rasterization. For this is what you want to cut out should be very heavy. Overhead at high resolutions of the screen is huge. It will save a bit the use GL_ARB_occlusion_query2 (ANY_SAMPLES_PASSED), but there will be a synchronization problem. We can solve it in two ways.
1 - use the result of visibility from the previous frame
2 - to get the result assynchronously through conditional rendering
1 - use the result of visibility from the previous frame
2 - to get the result assynchronously through conditional rendering
-

Barnes - Posts: 226
- Joined: Thu Dec 24, 2009 2:26 pm
- Location: Russia, Moscow
Re: Hardware Occlusion queries.
It also breaks your ability to do batching/instancing, so you really need to evaluate performance both with and without rather than just assume that it will be faster.
We had the power, we had the space, we had a sense of time and place
We knew the words, we knew the score, we knew what we were fighting for
We knew the words, we knew the score, we knew what we were fighting for
-

mh - Posts: 2292
- Joined: Sat Jan 12, 2008 1:38 am
Re: Hardware Occlusion queries.
Aye its not exactly easy to get this one done right, and it comes with some downsides to.
One reason i was exploring it was because of particles such as rocket explosions bleeding through solids,
i tried various methods to get rid of them but even the best fixes still lets some of the explosion bleed through.
Still looking for a reliable way to do this.
One reason i was exploring it was because of particles such as rocket explosions bleeding through solids,
i tried various methods to get rid of them but even the best fixes still lets some of the explosion bleed through.
Still looking for a reliable way to do this.
Productivity is a state of mind.
-

revelator - Posts: 2567
- Joined: Thu Jan 24, 2008 12:04 pm
- Location: inside tha debugger
Re: Hardware Occlusion queries.
Soft particles is the term you're looking for: http://blog.wolfire.com/2010/04/Soft-Particles
We had the power, we had the space, we had a sense of time and place
We knew the words, we knew the score, we knew what we were fighting for
We knew the words, we knew the score, we knew what we were fighting for
-

mh - Posts: 2292
- Joined: Sat Jan 12, 2008 1:38 am
Re: Hardware Occlusion queries.
Dooh ... your right i should have thought of that one since i helped get this working in the darkmod engine
Hrrr i guess getting to the depthbuffer will be just as fun in quake...
Hrrr i guess getting to the depthbuffer will be just as fun in quake...
Productivity is a state of mind.
-

revelator - Posts: 2567
- Joined: Thu Jan 24, 2008 12:04 pm
- Location: inside tha debugger
Re: Hardware Occlusion queries.
soft particles shader
- Code: Select all
out vec2 v_texCoord0;
out float v_depth;
out vec4 v_color;
uniform mat4 u_modelViewProjectionMatrix, u_modelViewMatrix;
layout(location = 0) in vec3 att_position;
layout(location = 4) in vec4 att_color4f;
layout(location = 5) in vec2 att_texCoordDiffuse;
void main (void) {
v_texCoord0 = att_texCoordDiffuse;
v_color = att_color4f;
v_depth = -(u_modelViewMatrix * vec4(att_position, 1.0)).z;
gl_Position = u_modelViewProjectionMatrix * vec4(att_position, 1.0);
}
in float v_depth;
in vec4 v_color;
in vec2 v_texCoord0;
uniform vec2 u_depthParms;
uniform vec2 u_mask;
uniform float u_thickness;
uniform float u_colorScale;
float DecodeDepth (const in float x, const in vec2 parms) {
return parms.x / (parms.y - x);
}
layout (binding = 0) uniform sampler2D u_map0;
layout (binding = 1) uniform sampler2DRect u_depthBufferMap;
void main (void) {
vec4 color = texture(u_map0, v_texCoord0);
if(u_thickness > 0.0){
// Z-feather
float depth = DecodeDepth(texture2DRect(u_depthBufferMap, gl_FragCoord.xy).x, u_depthParms);
float softness = clamp((depth - v_depth) / u_thickness, 0.0, 1.0);
fragData = color * v_color * u_colorScale;
fragData *= mix(vec4(1.0), vec4(softness), u_mask.xxxy);
}
else
fragData = color * v_color;
}
-

Barnes - Posts: 226
- Joined: Thu Dec 24, 2009 2:26 pm
- Location: Russia, Moscow
Re: Hardware Occlusion queries.
revelator wrote:That's a startthanks barnes.
Ah, yes... Some explanations:
1 - u_depthParms -
for infinity far plane
depthParms[0] = r_zNear->value; // 3.0 by default
depthParms[1] = 0.9995f;
or for standart projection matrix
scale = 1.f / (1.f - r_zNear->value / r_zFar->value);
depthParms[0] = r_zNear->value * scale;
depthParms[1] = scale;
2 - u_mask
blending mask
if (p->sFactor == GL_ONE && p->dFactor == GL_ONE)
qglUniform2f (particle_mask, 1.0, 0.0); //color
else
qglUniform2f (particle_mask, 0.0, 1.0); //alpha
-

Barnes - Posts: 226
- Joined: Thu Dec 24, 2009 2:26 pm
- Location: Russia, Moscow
16 posts
• Page 1 of 2 • 1, 2
Who is online
Users browsing this forum: No registered users and 1 guest