Hardware Occlusion queries.

revelator · Post by **revelator** » Mon Apr 10, 2017 12:11 am

Started toying a bit with those, atleast had some success.

qboolean GL_Occlusion(GLfloat width, GLfloat height)
{
	float       w = 640.0f / (float)width;
	float       h = 480.0f / (float)height;
	float       cornerFactor = 2.0f;
	double      corner1 = realtime*2;
	double      corner2 = realtime*3;
	double      corner3 = realtime*4;
	double      corner4 = realtime*5;
	qboolean    Occluded;
	GLuint      occQuery;
	GLuint      occSamples = 0;
    GLuint      occAvailable = 0;

    if(GL_ExtensionBits & HAS_OCCLUSION)
    {
        // start up queries for occlusion.
        glGenQueries(1, &occQuery);

        // take down color and depthmask, we do not want to draw anything.
        glColorMask(GL_FALSE, GL_FALSE, GL_FALSE, GL_FALSE);
        glDepthMask(GL_FALSE);

        // Load queries for Occlusion.
        glBeginQuery(GL_SAMPLES_PASSED, occQuery);

        // do a full scree quad to get the data it needs,
        // do not render it, hence taking down color and depthmask above.
        glBegin(GL_QUADS);
        glVertex2f(-(w * 0.5f) + (sinf(corner1) * cornerFactor), -(h * 0.5f) + (cosf(corner1) * cornerFactor));
        glVertex2f(-(w * 0.5f) + (sinf(corner2) * cornerFactor), (h * 0.5f) + (cosf(corner2) * cornerFactor));
        glVertex2f((w * 0.5f) + (sinf(corner3) * cornerFactor), (h * 0.5f) + (cosf(corner3) * cornerFactor));
        glVertex2f((w * 0.5f) + (sinf(corner4) * cornerFactor), -(h * 0.5f) + (cosf(corner4) * cornerFactor));
        glEnd();

        // Occlusion test done
        glEndQuery(GL_SAMPLES_PASSED);

        // flush queries
        glFlush();

        do
        {
            // Run queries until pipeline get's availiable.
            glGetQueryObjectiv(occQuery, GL_QUERY_RESULT_AVAILABLE, &occAvailable);
        } while(!occAvailable);

        // go back to normal rendering.
        glDepthMask(GL_TRUE);
        glColorMask(GL_TRUE, GL_TRUE, GL_TRUE, GL_TRUE);

        // fresh in from above,
        // and tested in a way that does not stall the pipeline.
        if (occAvailable > 0)
        {
            // Test again and output samples that passed.
            glGetQueryObjectiv(occQuery, GL_QUERY_RESULT, &occSamples);

            // get occlusion state (false = visible - true = occluded).
            Occluded = (occSamples > 0) ? false : true;
        }
    }
    else
    {
        // if we have an ancient card let thing's pass.
        Occluded = false;
    }
    return Occluded;
}

above code was ported out of quake royale where it was used for oclluding lensflare and bloom.

It does work and is even resonably fast, but it does tend to get a little to effective on smaller screen space objects.
The word is that it is as it is, so to get to some of the benefits you need to use it in huge complex scenes.

Spike · Post by **Spike** » Mon Apr 10, 2017 4:15 am

yeah... that's not how you're meant to do them.

firstly you're leaking occlusion query handles.
secondly, you're busylooping the cpu while the driver is busy feeding the gpu while the gpu is still idle (a general rule of thumb is that you should only check the result of occlusion queries on the _following_ frame - make it slightly larger/nearer so you won't get occasional flickering. at a minimum you should draw something else unrelated between the endquery and the getquery, to at least give the gpu/driver a chance to catch up with the cpu/app, yes this something else will not be detected by the occlusion query hence the whole next-frame thing).
drawing a fullscreen quad for your occlusion query is really quite pointless too of course...

GPUs are frikkin fast nowadays, so really think of occlusion queries as just an optimisation to reduce the cpu overhead sending lots of invisible drawcalls at the driver. if you're submitting one drawcall to avoid a single other (and probably needing to submit BOTH draw calls anyway), there had better be a GOOD reason for that...
They're useful for doorways so that you cull entire rooms, or for forward-rendered rtlights maybe, but totally pointless for your average quake mdl.
They may also be useful for cheat detection, but hey...

at least that's how I see them - as a cpu/gpu sync nightmare.

revelator · Post by **revelator** » Mon Apr 10, 2017 8:27 am

Refined it a bit in the meantime but the code itself seems to have originated from nvidias codesample.

this part glGetQueryObjectiv(occQuery, GL_QUERY_RESULT_AVAILABLE, &occAvailable);
is actually to avoid hogging the gpu as occAvaliable will only be true if the query was done, the way it was handled though is another matter the correct way according to all sources i can find is to just do it like this

glGetQueryObjectiv(occQuery, GL_QUERY_RESULT_AVAILABLE, &occAvailable);

if (occAvailable > 0)
glGetQueryObjectiv(occQuery, GL_QUERY_RESULT, &occSamples);

the last query wont run unless GL_QUERY_RESULT_AVAILABLE has spit out that the pipeline is ready to recieve data again.

And sure you dont have to use a fullscreen quad for testing on you can also use triangle mode or whatever

Im using it for bloom occlusion now and it seems to work rather well for that.

Barnes · Post by **Barnes** » Mon Apr 10, 2017 3:44 pm

try to use conditional render
https://www.khronos.org/registry/OpenGL ... render.txt
https://www.khronos.org/registry/OpenGL ... nder.xhtml (supported in gl 3.0+)

revelator · Post by **revelator** » Tue Apr 11, 2017 4:47 am

Allready did

works ok now, not noticing any slowdowns.

But spike is correct it works better on complex stuff.

Also it was mostly an experiment to see how well or not it works.

mh · Post by mh » Tue Apr 11, 2017 3:43 pm

With occlusion queries you're meant to issue the query, then come back a frame or two later and fetch the results, otherwise you've just broken CPU/GPU pipelining and you've done the equivalent of a great big glFinish call in the middle of your code.

If fetching the results immediately doesn't reduce your framerate to at least half what it was (and it should, even for just one query, even for a simple single quad) then your CPU/GPU pipelining is probably already broken elsewhere.

revelator · Post by **revelator** » Wed Apr 12, 2017 12:05 am

First version did indeed cause some slowdowns, probably it was also a bad idea to try and use it as a hardware version of R_CullBox.

New version uses a low poly version of the bbox to fill the queries for 3 frames before drawing the real deal.

The fullscreen quad part in version one was just copied of quake royale,
at the time i was not sure precisely how the queries worked, seems the original author was not either.

Its an interresting technique but software occlusion does the job better still so a bit pointless.

Barnes · Post by **Barnes** » Wed Apr 12, 2017 5:15 pm

The occlusion quary is very strongly tied to rasterization. For this is what you want to cut out should be very heavy. Overhead at high resolutions of the screen is huge. It will save a bit the use GL_ARB_occlusion_query2 (ANY_SAMPLES_PASSED), but there will be a synchronization problem. We can solve it in two ways.
1 - use the result of visibility from the previous frame
2 - to get the result assynchronously through conditional rendering

mh · Post by mh » Wed Apr 12, 2017 7:08 pm

It also breaks your ability to do batching/instancing, so you really need to evaluate performance both with and without rather than just assume that it will be faster.

revelator · Post by **revelator** » Thu Apr 13, 2017 10:55 am

Aye its not exactly easy to get this one done right, and it comes with some downsides to.

One reason i was exploring it was because of particles such as rocket explosions bleeding through solids,
i tried various methods to get rid of them but even the best fixes still lets some of the explosion bleed through.

Still looking for a reliable way to do this.

mh · Post by mh » Thu Apr 13, 2017 4:04 pm

Soft particles is the term you're looking for: http://blog.wolfire.com/2010/04/Soft-Particles

revelator · Post by **revelator** » Thu Apr 13, 2017 6:56 pm

Dooh ... your right i should have thought of that one since i helped get this working in the darkmod engine

Hrrr i guess getting to the depthbuffer will be just as fun in quake...

Barnes · Post by **Barnes** » Fri Apr 14, 2017 9:45 am

soft particles shader

Code: Select all

out vec2			v_texCoord0;
out float			v_depth;
out	vec4			v_color;
uniform mat4		u_modelViewProjectionMatrix, u_modelViewMatrix;

layout(location = 0) in vec3 att_position;
layout(location = 4) in vec4 att_color4f;
layout(location = 5) in vec2 att_texCoordDiffuse;

void main (void) {
	v_texCoord0 = att_texCoordDiffuse;
	v_color = att_color4f;
	v_depth = -(u_modelViewMatrix * vec4(att_position, 1.0)).z;
	gl_Position = u_modelViewProjectionMatrix * vec4(att_position, 1.0);
}

in float		v_depth;
in vec4			v_color;
in vec2			v_texCoord0;

uniform vec2			u_depthParms;
uniform vec2			u_mask;
uniform float			u_thickness;
uniform float			u_colorScale;

float DecodeDepth (const in float x, const in vec2 parms) {
	return parms.x / (parms.y - x);
}

layout (binding = 0) uniform sampler2D		u_map0;
layout (binding = 1) uniform sampler2DRect	u_depthBufferMap;

void main (void) {
	vec4 color = texture(u_map0, v_texCoord0);
	
	if(u_thickness > 0.0){
	// Z-feather
	float depth = DecodeDepth(texture2DRect(u_depthBufferMap, gl_FragCoord.xy).x, u_depthParms);
	float softness = clamp((depth - v_depth) / u_thickness, 0.0, 1.0);
	
	fragData = color * v_color * u_colorScale;
	fragData *= mix(vec4(1.0), vec4(softness), u_mask.xxxy);
	}
	else
	fragData = color * v_color;
}

revelator · Post by **revelator** » Sun Apr 16, 2017 9:53 am

That's a start

thanks barnes.

Barnes · Post by **Barnes** » Tue Apr 18, 2017 2:59 pm

revelator wrote:That's a start thanks barnes.

Ah, yes... Some explanations:

1 - u_depthParms -

for infinity far plane

depthParms[0] = r_zNear->value; // 3.0 by default
depthParms[1] = 0.9995f;

or for standart projection matrix

scale = 1.f / (1.f - r_zNear->value / r_zFar->value);

depthParms[0] = r_zNear->value * scale;
depthParms[1] = scale;

2 - u_mask

blending mask

if (p->sFactor == GL_ONE && p->dFactor == GL_ONE)
qglUniform2f (particle_mask, 1.0, 0.0); //color
else
qglUniform2f (particle_mask, 0.0, 1.0); //alpha

InsideQC Forums

Hardware Occlusion queries.

Hardware Occlusion queries.

Re: Hardware Occlusion queries.

Re: Hardware Occlusion queries.

Re: Hardware Occlusion queries.

Re: Hardware Occlusion queries.

Re: Hardware Occlusion queries.

Re: Hardware Occlusion queries.

Re: Hardware Occlusion queries.

Re: Hardware Occlusion queries.

Re: Hardware Occlusion queries.

Re: Hardware Occlusion queries.

Re: Hardware Occlusion queries.

Re: Hardware Occlusion queries.

Re: Hardware Occlusion queries.

Re: Hardware Occlusion queries.