Forum

Hardware Occlusion queries.

Discuss programming topics for the various GPL'd game engine sources.

Moderator: InsideQC Admins

Hardware Occlusion queries.

Postby revelator » Mon Apr 10, 2017 12:11 am

Started toying a bit with those, atleast had some success.

Code: Select all
qboolean GL_Occlusion(GLfloat width, GLfloat height)
{
   float       w = 640.0f / (float)width;
   float       h = 480.0f / (float)height;
   float       cornerFactor = 2.0f;
   double      corner1 = realtime*2;
   double      corner2 = realtime*3;
   double      corner3 = realtime*4;
   double      corner4 = realtime*5;
   qboolean    Occluded;
   GLuint      occQuery;
   GLuint      occSamples = 0;
    GLuint      occAvailable = 0;

    if(GL_ExtensionBits & HAS_OCCLUSION)
    {
        // start up queries for occlusion.
        glGenQueries(1, &occQuery);

        // take down color and depthmask, we do not want to draw anything.
        glColorMask(GL_FALSE, GL_FALSE, GL_FALSE, GL_FALSE);
        glDepthMask(GL_FALSE);

        // Load queries for Occlusion.
        glBeginQuery(GL_SAMPLES_PASSED, occQuery);

        // do a full scree quad to get the data it needs,
        // do not render it, hence taking down color and depthmask above.
        glBegin(GL_QUADS);
        glVertex2f(-(w * 0.5f) + (sinf(corner1) * cornerFactor), -(h * 0.5f) + (cosf(corner1) * cornerFactor));
        glVertex2f(-(w * 0.5f) + (sinf(corner2) * cornerFactor), (h * 0.5f) + (cosf(corner2) * cornerFactor));
        glVertex2f((w * 0.5f) + (sinf(corner3) * cornerFactor), (h * 0.5f) + (cosf(corner3) * cornerFactor));
        glVertex2f((w * 0.5f) + (sinf(corner4) * cornerFactor), -(h * 0.5f) + (cosf(corner4) * cornerFactor));
        glEnd();

        // Occlusion test done
        glEndQuery(GL_SAMPLES_PASSED);

        // flush queries
        glFlush();

        do
        {
            // Run queries until pipeline get's availiable.
            glGetQueryObjectiv(occQuery, GL_QUERY_RESULT_AVAILABLE, &occAvailable);
        } while(!occAvailable);

        // go back to normal rendering.
        glDepthMask(GL_TRUE);
        glColorMask(GL_TRUE, GL_TRUE, GL_TRUE, GL_TRUE);

        // fresh in from above,
        // and tested in a way that does not stall the pipeline.
        if (occAvailable > 0)
        {
            // Test again and output samples that passed.
            glGetQueryObjectiv(occQuery, GL_QUERY_RESULT, &occSamples);

            // get occlusion state (false = visible - true = occluded).
            Occluded = (occSamples > 0) ? false : true;
        }
    }
    else
    {
        // if we have an ancient card let thing's pass.
        Occluded = false;
    }
    return Occluded;
}


above code was ported out of quake royale where it was used for oclluding lensflare and bloom.

It does work and is even resonably fast, but it does tend to get a little to effective on smaller screen space objects.
The word is that it is as it is, so to get to some of the benefits you need to use it in huge complex scenes.
Productivity is a state of mind.
User avatar
revelator
 
Posts: 2540
Joined: Thu Jan 24, 2008 12:04 pm
Location: inside tha debugger

Re: Hardware Occlusion queries.

Postby Spike » Mon Apr 10, 2017 4:15 am

yeah... that's not how you're meant to do them.

firstly you're leaking occlusion query handles.
secondly, you're busylooping the cpu while the driver is busy feeding the gpu while the gpu is still idle (a general rule of thumb is that you should only check the result of occlusion queries on the _following_ frame - make it slightly larger/nearer so you won't get occasional flickering. at a minimum you should draw something else unrelated between the endquery and the getquery, to at least give the gpu/driver a chance to catch up with the cpu/app, yes this something else will not be detected by the occlusion query hence the whole next-frame thing).
drawing a fullscreen quad for your occlusion query is really quite pointless too of course...

GPUs are frikkin fast nowadays, so really think of occlusion queries as just an optimisation to reduce the cpu overhead sending lots of invisible drawcalls at the driver. if you're submitting one drawcall to avoid a single other (and probably needing to submit BOTH draw calls anyway), there had better be a GOOD reason for that...
They're useful for doorways so that you cull entire rooms, or for forward-rendered rtlights maybe, but totally pointless for your average quake mdl.
They may also be useful for cheat detection, but hey...

at least that's how I see them - as a cpu/gpu sync nightmare. :s
Spike
 
Posts: 2883
Joined: Fri Nov 05, 2004 3:12 am
Location: UK

Re: Hardware Occlusion queries.

Postby revelator » Mon Apr 10, 2017 8:27 am

Refined it a bit in the meantime but the code itself seems to have originated from nvidias codesample.

this part glGetQueryObjectiv(occQuery, GL_QUERY_RESULT_AVAILABLE, &occAvailable);
is actually to avoid hogging the gpu as occAvaliable will only be true if the query was done, the way it was handled though is another matter the correct way according to all sources i can find is to just do it like this

glGetQueryObjectiv(occQuery, GL_QUERY_RESULT_AVAILABLE, &occAvailable);

if (occAvailable > 0)
glGetQueryObjectiv(occQuery, GL_QUERY_RESULT, &occSamples);

the last query wont run unless GL_QUERY_RESULT_AVAILABLE has spit out that the pipeline is ready to recieve data again.

And sure you dont have to use a fullscreen quad for testing on you can also use triangle mode or whatever :)

Im using it for bloom occlusion now and it seems to work rather well for that.
Productivity is a state of mind.
User avatar
revelator
 
Posts: 2540
Joined: Thu Jan 24, 2008 12:04 pm
Location: inside tha debugger

Re: Hardware Occlusion queries.

Postby Barnes » Mon Apr 10, 2017 3:44 pm

User avatar
Barnes
 
Posts: 223
Joined: Thu Dec 24, 2009 2:26 pm
Location: Russia, Moscow

Re: Hardware Occlusion queries.

Postby revelator » Tue Apr 11, 2017 4:47 am

Allready did ;) works ok now, not noticing any slowdowns.

But spike is correct it works better on complex stuff.

Also it was mostly an experiment to see how well or not it works.
Productivity is a state of mind.
User avatar
revelator
 
Posts: 2540
Joined: Thu Jan 24, 2008 12:04 pm
Location: inside tha debugger

Re: Hardware Occlusion queries.

Postby mh » Tue Apr 11, 2017 3:43 pm

With occlusion queries you're meant to issue the query, then come back a frame or two later and fetch the results, otherwise you've just broken CPU/GPU pipelining and you've done the equivalent of a great big glFinish call in the middle of your code.

If fetching the results immediately doesn't reduce your framerate to at least half what it was (and it should, even for just one query, even for a simple single quad) then your CPU/GPU pipelining is probably already broken elsewhere.
We had the power, we had the space, we had a sense of time and place
We knew the words, we knew the score, we knew what we were fighting for
User avatar
mh
 
Posts: 2292
Joined: Sat Jan 12, 2008 1:38 am

Re: Hardware Occlusion queries.

Postby revelator » Wed Apr 12, 2017 12:05 am

First version did indeed cause some slowdowns, probably it was also a bad idea to try and use it as a hardware version of R_CullBox.

New version uses a low poly version of the bbox to fill the queries for 3 frames before drawing the real deal.

The fullscreen quad part in version one was just copied of quake royale,
at the time i was not sure precisely how the queries worked, seems the original author was not either.

Its an interresting technique but software occlusion does the job better still so a bit pointless.
Productivity is a state of mind.
User avatar
revelator
 
Posts: 2540
Joined: Thu Jan 24, 2008 12:04 pm
Location: inside tha debugger

Re: Hardware Occlusion queries.

Postby Barnes » Wed Apr 12, 2017 5:15 pm

The occlusion quary is very strongly tied to rasterization. For this is what you want to cut out should be very heavy. Overhead at high resolutions of the screen is huge. It will save a bit the use GL_ARB_occlusion_query2 (ANY_SAMPLES_PASSED), but there will be a synchronization problem. We can solve it in two ways.
1 - use the result of visibility from the previous frame
2 - to get the result assynchronously through conditional rendering
User avatar
Barnes
 
Posts: 223
Joined: Thu Dec 24, 2009 2:26 pm
Location: Russia, Moscow

Re: Hardware Occlusion queries.

Postby mh » Wed Apr 12, 2017 7:08 pm

It also breaks your ability to do batching/instancing, so you really need to evaluate performance both with and without rather than just assume that it will be faster.
We had the power, we had the space, we had a sense of time and place
We knew the words, we knew the score, we knew what we were fighting for
User avatar
mh
 
Posts: 2292
Joined: Sat Jan 12, 2008 1:38 am

Re: Hardware Occlusion queries.

Postby revelator » Thu Apr 13, 2017 10:55 am

Aye its not exactly easy to get this one done right, and it comes with some downsides to.

One reason i was exploring it was because of particles such as rocket explosions bleeding through solids,
i tried various methods to get rid of them but even the best fixes still lets some of the explosion bleed through.

Still looking for a reliable way to do this.
Productivity is a state of mind.
User avatar
revelator
 
Posts: 2540
Joined: Thu Jan 24, 2008 12:04 pm
Location: inside tha debugger

Re: Hardware Occlusion queries.

Postby mh » Thu Apr 13, 2017 4:04 pm

Soft particles is the term you're looking for: http://blog.wolfire.com/2010/04/Soft-Particles
We had the power, we had the space, we had a sense of time and place
We knew the words, we knew the score, we knew what we were fighting for
User avatar
mh
 
Posts: 2292
Joined: Sat Jan 12, 2008 1:38 am

Re: Hardware Occlusion queries.

Postby revelator » Thu Apr 13, 2017 6:56 pm

Dooh ... your right i should have thought of that one since i helped get this working in the darkmod engine :oops:
Hrrr i guess getting to the depthbuffer will be just as fun in quake...
Productivity is a state of mind.
User avatar
revelator
 
Posts: 2540
Joined: Thu Jan 24, 2008 12:04 pm
Location: inside tha debugger

Re: Hardware Occlusion queries.

Postby Barnes » Fri Apr 14, 2017 9:45 am

soft particles shader

Code: Select all
out vec2         v_texCoord0;
out float         v_depth;
out   vec4         v_color;
uniform mat4      u_modelViewProjectionMatrix, u_modelViewMatrix;

layout(location = 0) in vec3 att_position;
layout(location = 4) in vec4 att_color4f;
layout(location = 5) in vec2 att_texCoordDiffuse;

void main (void) {
   v_texCoord0 = att_texCoordDiffuse;
   v_color = att_color4f;
   v_depth = -(u_modelViewMatrix * vec4(att_position, 1.0)).z;
   gl_Position = u_modelViewProjectionMatrix * vec4(att_position, 1.0);
}

in float      v_depth;
in vec4         v_color;
in vec2         v_texCoord0;

uniform vec2         u_depthParms;
uniform vec2         u_mask;
uniform float         u_thickness;
uniform float         u_colorScale;

float DecodeDepth (const in float x, const in vec2 parms) {
   return parms.x / (parms.y - x);
}

layout (binding = 0) uniform sampler2D      u_map0;
layout (binding = 1) uniform sampler2DRect   u_depthBufferMap;

void main (void) {
   vec4 color = texture(u_map0, v_texCoord0);
   
   if(u_thickness > 0.0){
   // Z-feather
   float depth = DecodeDepth(texture2DRect(u_depthBufferMap, gl_FragCoord.xy).x, u_depthParms);
   float softness = clamp((depth - v_depth) / u_thickness, 0.0, 1.0);
   
   fragData = color * v_color * u_colorScale;
   fragData *= mix(vec4(1.0), vec4(softness), u_mask.xxxy);
   }
   else
   fragData = color * v_color;
}
User avatar
Barnes
 
Posts: 223
Joined: Thu Dec 24, 2009 2:26 pm
Location: Russia, Moscow

Re: Hardware Occlusion queries.

Postby revelator » Sun Apr 16, 2017 9:53 am

That's a start :) thanks barnes.
Productivity is a state of mind.
User avatar
revelator
 
Posts: 2540
Joined: Thu Jan 24, 2008 12:04 pm
Location: inside tha debugger

Re: Hardware Occlusion queries.

Postby Barnes » Tue Apr 18, 2017 2:59 pm

revelator wrote:That's a start :) thanks barnes.

Ah, yes... Some explanations:

1 - u_depthParms -

for infinity far plane

depthParms[0] = r_zNear->value; // 3.0 by default
depthParms[1] = 0.9995f;

or for standart projection matrix

scale = 1.f / (1.f - r_zNear->value / r_zFar->value);

depthParms[0] = r_zNear->value * scale;
depthParms[1] = scale;

2 - u_mask

blending mask

if (p->sFactor == GL_ONE && p->dFactor == GL_ONE)
qglUniform2f (particle_mask, 1.0, 0.0); //color
else
qglUniform2f (particle_mask, 0.0, 1.0); //alpha
User avatar
Barnes
 
Posts: 223
Joined: Thu Dec 24, 2009 2:26 pm
Location: Russia, Moscow

Next

Return to Engine Programming

Who is online

Users browsing this forum: No registered users and 2 guests