Hardware Occlusion queries.

Discuss programming topics for the various GPL'd game engine sources.
revelator
Posts: 2621
Joined: Thu Jan 24, 2008 12:04 pm
Location: inside tha debugger

Hardware Occlusion queries.

Post by revelator »

Started toying a bit with those, atleast had some success.

Code: Select all

qboolean GL_Occlusion(GLfloat width, GLfloat height)
{
	float       w = 640.0f / (float)width;
	float       h = 480.0f / (float)height;
	float       cornerFactor = 2.0f;
	double      corner1 = realtime*2;
	double      corner2 = realtime*3;
	double      corner3 = realtime*4;
	double      corner4 = realtime*5;
	qboolean    Occluded;
	GLuint      occQuery;
	GLuint      occSamples = 0;
    GLuint      occAvailable = 0;

    if(GL_ExtensionBits & HAS_OCCLUSION)
    {
        // start up queries for occlusion.
        glGenQueries(1, &occQuery);

        // take down color and depthmask, we do not want to draw anything.
        glColorMask(GL_FALSE, GL_FALSE, GL_FALSE, GL_FALSE);
        glDepthMask(GL_FALSE);

        // Load queries for Occlusion.
        glBeginQuery(GL_SAMPLES_PASSED, occQuery);

        // do a full scree quad to get the data it needs,
        // do not render it, hence taking down color and depthmask above.
        glBegin(GL_QUADS);
        glVertex2f(-(w * 0.5f) + (sinf(corner1) * cornerFactor), -(h * 0.5f) + (cosf(corner1) * cornerFactor));
        glVertex2f(-(w * 0.5f) + (sinf(corner2) * cornerFactor), (h * 0.5f) + (cosf(corner2) * cornerFactor));
        glVertex2f((w * 0.5f) + (sinf(corner3) * cornerFactor), (h * 0.5f) + (cosf(corner3) * cornerFactor));
        glVertex2f((w * 0.5f) + (sinf(corner4) * cornerFactor), -(h * 0.5f) + (cosf(corner4) * cornerFactor));
        glEnd();

        // Occlusion test done
        glEndQuery(GL_SAMPLES_PASSED);

        // flush queries
        glFlush();

        do
        {
            // Run queries until pipeline get's availiable.
            glGetQueryObjectiv(occQuery, GL_QUERY_RESULT_AVAILABLE, &occAvailable);
        } while(!occAvailable);

        // go back to normal rendering.
        glDepthMask(GL_TRUE);
        glColorMask(GL_TRUE, GL_TRUE, GL_TRUE, GL_TRUE);

        // fresh in from above,
        // and tested in a way that does not stall the pipeline.
        if (occAvailable > 0)
        {
            // Test again and output samples that passed.
            glGetQueryObjectiv(occQuery, GL_QUERY_RESULT, &occSamples);

            // get occlusion state (false = visible - true = occluded).
            Occluded = (occSamples > 0) ? false : true;
        }
    }
    else
    {
        // if we have an ancient card let thing's pass.
        Occluded = false;
    }
    return Occluded;
}
above code was ported out of quake royale where it was used for oclluding lensflare and bloom.

It does work and is even resonably fast, but it does tend to get a little to effective on smaller screen space objects.
The word is that it is as it is, so to get to some of the benefits you need to use it in huge complex scenes.
Productivity is a state of mind.
Spike
Posts: 2914
Joined: Fri Nov 05, 2004 3:12 am
Location: UK
Contact:

Re: Hardware Occlusion queries.

Post by Spike »

yeah... that's not how you're meant to do them.

firstly you're leaking occlusion query handles.
secondly, you're busylooping the cpu while the driver is busy feeding the gpu while the gpu is still idle (a general rule of thumb is that you should only check the result of occlusion queries on the _following_ frame - make it slightly larger/nearer so you won't get occasional flickering. at a minimum you should draw something else unrelated between the endquery and the getquery, to at least give the gpu/driver a chance to catch up with the cpu/app, yes this something else will not be detected by the occlusion query hence the whole next-frame thing).
drawing a fullscreen quad for your occlusion query is really quite pointless too of course...

GPUs are frikkin fast nowadays, so really think of occlusion queries as just an optimisation to reduce the cpu overhead sending lots of invisible drawcalls at the driver. if you're submitting one drawcall to avoid a single other (and probably needing to submit BOTH draw calls anyway), there had better be a GOOD reason for that...
They're useful for doorways so that you cull entire rooms, or for forward-rendered rtlights maybe, but totally pointless for your average quake mdl.
They may also be useful for cheat detection, but hey...

at least that's how I see them - as a cpu/gpu sync nightmare. :s
revelator
Posts: 2621
Joined: Thu Jan 24, 2008 12:04 pm
Location: inside tha debugger

Re: Hardware Occlusion queries.

Post by revelator »

Refined it a bit in the meantime but the code itself seems to have originated from nvidias codesample.

this part glGetQueryObjectiv(occQuery, GL_QUERY_RESULT_AVAILABLE, &occAvailable);
is actually to avoid hogging the gpu as occAvaliable will only be true if the query was done, the way it was handled though is another matter the correct way according to all sources i can find is to just do it like this

glGetQueryObjectiv(occQuery, GL_QUERY_RESULT_AVAILABLE, &occAvailable);

if (occAvailable > 0)
glGetQueryObjectiv(occQuery, GL_QUERY_RESULT, &occSamples);

the last query wont run unless GL_QUERY_RESULT_AVAILABLE has spit out that the pipeline is ready to recieve data again.

And sure you dont have to use a fullscreen quad for testing on you can also use triangle mode or whatever :)

Im using it for bloom occlusion now and it seems to work rather well for that.
Productivity is a state of mind.
Barnes
Posts: 232
Joined: Thu Dec 24, 2009 2:26 pm
Location: Russia, Moscow
Contact:

Re: Hardware Occlusion queries.

Post by Barnes »

revelator
Posts: 2621
Joined: Thu Jan 24, 2008 12:04 pm
Location: inside tha debugger

Re: Hardware Occlusion queries.

Post by revelator »

Allready did ;) works ok now, not noticing any slowdowns.

But spike is correct it works better on complex stuff.

Also it was mostly an experiment to see how well or not it works.
Productivity is a state of mind.
mh
Posts: 2292
Joined: Sat Jan 12, 2008 1:38 am

Re: Hardware Occlusion queries.

Post by mh »

With occlusion queries you're meant to issue the query, then come back a frame or two later and fetch the results, otherwise you've just broken CPU/GPU pipelining and you've done the equivalent of a great big glFinish call in the middle of your code.

If fetching the results immediately doesn't reduce your framerate to at least half what it was (and it should, even for just one query, even for a simple single quad) then your CPU/GPU pipelining is probably already broken elsewhere.
We had the power, we had the space, we had a sense of time and place
We knew the words, we knew the score, we knew what we were fighting for
revelator
Posts: 2621
Joined: Thu Jan 24, 2008 12:04 pm
Location: inside tha debugger

Re: Hardware Occlusion queries.

Post by revelator »

First version did indeed cause some slowdowns, probably it was also a bad idea to try and use it as a hardware version of R_CullBox.

New version uses a low poly version of the bbox to fill the queries for 3 frames before drawing the real deal.

The fullscreen quad part in version one was just copied of quake royale,
at the time i was not sure precisely how the queries worked, seems the original author was not either.

Its an interresting technique but software occlusion does the job better still so a bit pointless.
Productivity is a state of mind.
Barnes
Posts: 232
Joined: Thu Dec 24, 2009 2:26 pm
Location: Russia, Moscow
Contact:

Re: Hardware Occlusion queries.

Post by Barnes »

The occlusion quary is very strongly tied to rasterization. For this is what you want to cut out should be very heavy. Overhead at high resolutions of the screen is huge. It will save a bit the use GL_ARB_occlusion_query2 (ANY_SAMPLES_PASSED), but there will be a synchronization problem. We can solve it in two ways.
1 - use the result of visibility from the previous frame
2 - to get the result assynchronously through conditional rendering
mh
Posts: 2292
Joined: Sat Jan 12, 2008 1:38 am

Re: Hardware Occlusion queries.

Post by mh »

It also breaks your ability to do batching/instancing, so you really need to evaluate performance both with and without rather than just assume that it will be faster.
We had the power, we had the space, we had a sense of time and place
We knew the words, we knew the score, we knew what we were fighting for
revelator
Posts: 2621
Joined: Thu Jan 24, 2008 12:04 pm
Location: inside tha debugger

Re: Hardware Occlusion queries.

Post by revelator »

Aye its not exactly easy to get this one done right, and it comes with some downsides to.

One reason i was exploring it was because of particles such as rocket explosions bleeding through solids,
i tried various methods to get rid of them but even the best fixes still lets some of the explosion bleed through.

Still looking for a reliable way to do this.
Productivity is a state of mind.
mh
Posts: 2292
Joined: Sat Jan 12, 2008 1:38 am

Re: Hardware Occlusion queries.

Post by mh »

Soft particles is the term you're looking for: http://blog.wolfire.com/2010/04/Soft-Particles
We had the power, we had the space, we had a sense of time and place
We knew the words, we knew the score, we knew what we were fighting for
revelator
Posts: 2621
Joined: Thu Jan 24, 2008 12:04 pm
Location: inside tha debugger

Re: Hardware Occlusion queries.

Post by revelator »

Dooh ... your right i should have thought of that one since i helped get this working in the darkmod engine :oops:
Hrrr i guess getting to the depthbuffer will be just as fun in quake...
Productivity is a state of mind.
Barnes
Posts: 232
Joined: Thu Dec 24, 2009 2:26 pm
Location: Russia, Moscow
Contact:

Re: Hardware Occlusion queries.

Post by Barnes »

soft particles shader

Code: Select all

out vec2			v_texCoord0;
out float			v_depth;
out	vec4			v_color;
uniform mat4		u_modelViewProjectionMatrix, u_modelViewMatrix;

layout(location = 0) in vec3 att_position;
layout(location = 4) in vec4 att_color4f;
layout(location = 5) in vec2 att_texCoordDiffuse;

void main (void) {
	v_texCoord0 = att_texCoordDiffuse;
	v_color = att_color4f;
	v_depth = -(u_modelViewMatrix * vec4(att_position, 1.0)).z;
	gl_Position = u_modelViewProjectionMatrix * vec4(att_position, 1.0);
}

in float		v_depth;
in vec4			v_color;
in vec2			v_texCoord0;

uniform vec2			u_depthParms;
uniform vec2			u_mask;
uniform float			u_thickness;
uniform float			u_colorScale;

float DecodeDepth (const in float x, const in vec2 parms) {
	return parms.x / (parms.y - x);
}

layout (binding = 0) uniform sampler2D		u_map0;
layout (binding = 1) uniform sampler2DRect	u_depthBufferMap;

void main (void) {
	vec4 color = texture(u_map0, v_texCoord0);
	
	if(u_thickness > 0.0){
	// Z-feather
	float depth = DecodeDepth(texture2DRect(u_depthBufferMap, gl_FragCoord.xy).x, u_depthParms);
	float softness = clamp((depth - v_depth) / u_thickness, 0.0, 1.0);
	
	fragData = color * v_color * u_colorScale;
	fragData *= mix(vec4(1.0), vec4(softness), u_mask.xxxy);
	}
	else
	fragData = color * v_color;
}
revelator
Posts: 2621
Joined: Thu Jan 24, 2008 12:04 pm
Location: inside tha debugger

Re: Hardware Occlusion queries.

Post by revelator »

That's a start :) thanks barnes.
Productivity is a state of mind.
Barnes
Posts: 232
Joined: Thu Dec 24, 2009 2:26 pm
Location: Russia, Moscow
Contact:

Re: Hardware Occlusion queries.

Post by Barnes »

revelator wrote:That's a start :) thanks barnes.
Ah, yes... Some explanations:

1 - u_depthParms -

for infinity far plane

depthParms[0] = r_zNear->value; // 3.0 by default
depthParms[1] = 0.9995f;

or for standart projection matrix

scale = 1.f / (1.f - r_zNear->value / r_zFar->value);

depthParms[0] = r_zNear->value * scale;
depthParms[1] = scale;

2 - u_mask

blending mask

if (p->sFactor == GL_ONE && p->dFactor == GL_ONE)
qglUniform2f (particle_mask, 1.0, 0.0); //color
else
qglUniform2f (particle_mask, 0.0, 1.0); //alpha
Post Reply