Forum

Batching Verts?

Discuss programming topics for the various GPL'd game engine sources.

Moderator: InsideQC Admins

Batching Verts?

Postby Baker » Wed Nov 05, 2014 10:55 pm

I know the basics of vertex arrays ... but I've never done anything complex with them aside from throwing all the verts in a model frame into one and rendering it without multitexture.

1. The mechanics of the batching verts (and using texcoord array)
--- I need to allocate a buffer to store the triangles. Each entry is a float x 3 (verts) float x 4 (texcoords: texture s0, t0, s1, t1) and maybe a GLuint for the texture slot?
--- Do I allocate the memory in chunks I assume for best performance?
--- I fill the buffer with verts that use the same GL capabilities (i.e. same glBlend settings)
--- How does multi-texture fit into this? For instance, lightmaps or fullbright textures using a different TMU.
--- I should submit the verts when new settings kick in. If I need to change any rendering option, I need to submit the verts first as I understand it.

How do I handle vertex lighting, say for a player model? The traditional code changes the glColor for each vertex.

Is there even a way to do per vertex glColor in Open GL 1.x through an array?

2. Let's say I want to batch up the verts for a .bsp
--- Ideally, I would do this per vis leaf --- except that isn't practical because there can be incredible numbers of vis leafs.
--- I sure don't want to render the whole thing.
--- is there any intelligent way to handle the .bsp or does that have to be created on the fly any time the vis frame changes.

3. Models
--- The model verts are different per frame. Does each frame effectively need a vertex array?
--- For interpolating, is there a method to combine the middle of 2 frames or do I need to manually calculate the verts?

Many questions, I'm trying to develop an implementation plan ...
The night is young. How else can I annoy the world before sunsrise? 8) Inquisitive minds want to know ! And if they don't -- well like that ever has stopped me before ..
User avatar
Baker
 
Posts: 3666
Joined: Tue Mar 14, 2006 5:15 am

Re: Batching Verts?

Postby Spike » Thu Nov 06, 2014 1:47 am

ditch gl 1.x
forget about it. it doesn't exist. its not worth using.
if it doesn't exist in gles2+ then you do NOT want to use it.
and if you're using it anyway, at least understand it in terms of gles2.

glVertexPointer(...)
glColorPointer(...)
glClientActiveTextureARB(GL_TEXTURE0_ARB)
glTexCoordPointer(...)
glClientActiveTextureARB(GL_TEXTURE1_ARB)
glTexCoordPointer(...)
glDrawRangeElements(...)

you need to use glEnableClientState to tell the drivers which arrays you have active otherwise it won't read from them.
you'll need to use glActiveTexture combined with glEnable and glBindTexture to actually use multiple textures. the above just shows how to specify the vertex attributes.

For best performance, you should try to ensure that your attributes are interleaved, supposedly not doing so is only a 5% performance hit, but really it depends how many verts you have. It helps the GPU cache.
Verticies are normally recommended to have 16-byte alignment or so.

Use VBOs so that you can avoid having to resubmit the exact same data to the hardware every single glDrawRangeElements call. Doing so normamly avoids any performance penalty from glDrawElements.
Use VAOs so that you can avoid having to call all your glVertexAttribPointer functions for every single draw call.

For vertex lighting on a player model, you build your static VBO with the vertex normals. You then set the lighting direction as a uniform in your vertex shader, and calculate the lighting there. This means that your C code can just specify the glsl to use and the raw attribute data, and your vertex shader can do the interpolation and lighting for you. Having the CPU loop through every single vertex for every single model for every single frame is a massive waste of CPU time when the GPU is much faster at doing it, and will be looping through them anyway. And there's no need to spam the cpu->gpu bus with attributes either because your VBOs are static.
As I said, ditch gl1.x - forget fixed function.
Spike
 
Posts: 2883
Joined: Fri Nov 05, 2004 3:12 am
Location: UK

Re: Batching Verts?

Postby Baker » Thu Nov 06, 2014 2:53 am

Nevermind this, I think I've found some information to guide me through interleaving. I was mostly on the right track below, but not quite.

Baker did post ... wrote:Interleaving: I have a list of verts:
v0, v1, v2,
v3, v4, v5
...

With interleaving, I put the UVs in the same array?
v0, v1, v2,
uv0, uv1

How does the call to glDrawElements look then?

glEnableClientState (GL_TEXTURE_COORD_ARRAY);
glEnableClientState (GL_VERTEX_ARRAY);

glTexCoordPointer (2, GL_FLOAT, STRIDE_ZERO_0, texcoords_array);
glVertexPointer (3, GL_FLOAT, STRIDE_ZERO_0, vertex_array);

Or would it look the same and I just use stride to allow use of the same array?
glTexCoordPointer (2, GL_FLOAT, /* STRIDE 12 = sizeof float * 3 */ 12 , &same_array[4]);
glVertexPointer (3, GL_FLOAT, /* STRIDE 8 = sizeof float * 2 */ 8 , &same_array[0]);


Spike wrote:ditch gl 1.x
I'm trying to get there ... :D Conceptually seeing the process in my head is what I am trying to picture at the moment. FTEQW, RMQ and SiPlus WebQuake are references I've been trying to mine. Quakeforge and DarkPlaces exist too, of course.
The night is young. How else can I annoy the world before sunsrise? 8) Inquisitive minds want to know ! And if they don't -- well like that ever has stopped me before ..
User avatar
Baker
 
Posts: 3666
Joined: Tue Mar 14, 2006 5:15 am

Re: Batching Verts?

Postby Spike » Thu Nov 06, 2014 5:30 am

the stride should be the same for each attribute assuming they're interleaved in the same array.
Spike
 
Posts: 2883
Joined: Fri Nov 05, 2004 3:12 am
Location: UK

Re: Batching Verts?

Postby Baker » Thu Nov 06, 2014 5:39 am

I have interleaved working in a test. Thanks for the advice. Didn't even know it existed.
The night is young. How else can I annoy the world before sunsrise? 8) Inquisitive minds want to know ! And if they don't -- well like that ever has stopped me before ..
User avatar
Baker
 
Posts: 3666
Joined: Tue Mar 14, 2006 5:15 am

Re: Batching Verts?

Postby ericw » Thu Nov 06, 2014 8:19 pm

Hey Baker,

2. Let's say I want to batch up the verts for a .bsp
--- Ideally, I would do this per vis leaf --- except that isn't practical because there can be incredible numbers of vis leafs.
--- I sure don't want to render the whole thing.
--- is there any intelligent way to handle the .bsp or does that have to be created on the fly any time the vis frame changes.


I just recently did this in Quakespasm, for world+bmodels, in what I hope is a pretty easy to understand way. It's included in v0.90 btw. (not sure if you're working on Fitz Mark V or some other engine?)

This is the diff for doing all world and brush model rendering via a static vbo (I unified brush model and world model drawing earlier, but you could do something similar for just the world): https://github.com/ericwa/Quakespasm/co ... 0b9b035903

The GL_BuildVBOs is called on map load or video mode change, and it loops over the world plus all brush models. I append all of the surf->polys->verts data (3x GL_FLOAT for the vertex, 2x GL_FLOAT for the texcoord, and 2x GL_FLOAT for the lightmap coords) from every surface into one big array, then upload it into a VBO. (also the index of the surface's first vertex in the giant array is stored back in the msurface_t structure, so we know how to draw that surface later). I also set up glVertexPointer and glTexCoordPointer at the same time, which is kind of lazy, but I wasn't using vertex arrays for anything else so it seemed easier to just leave them turned on.

Then, to draw from this vbo, I'm using:

glDrawElements (GL_TRIANGLES, num_vbo_indices, GL_UNSIGNED_INT, vbo_indices);

The actual vertex data is sourced from the VBO so it's completely static and hopefully in vram. The vbo_indices is a regular C array of 32-bit unsigned ints in my process's memory, containing the indices of the verts from the VBO for the trinagles. The only slightly confusing bit is tesselating the surface from a polygon into a list of triangles, this is done by the R_TriangleIndicesForSurf function.
So every frame, just the triangle indices for all visible surfaces get uploaded. The surface texture and lightmap texture need to be bound between glDrawElements calls, so the number of surfaces that can fit in a lightmap limits your batch size.

There are probably a lot of other ways to draw the world, but reading the forum archives here, this seemed to be what MH and other recommended, it was farily quick to implement, and gave a decent speedup on my hardware.

--

For alias models, I borrowed quite a bit from RMQe, but wrote the vertex shader myself in GLSL (the vertex shader does lerping, and calculates the vertex colors). Here is the code: https://github.com/ericwa/Quakespasm/co ... 5?expand=1
This part isn't merged in to the QS svn yet, I need to track down a performance problem on AMD cards, where I believe I'm causing software fallback for some reason.

--- The model verts are different per frame. Does each frame effectively need a vertex array?

What RMQe did is paste the verts for each frame one after the other, and the put the texture coordinates at the end, like this:

[frame 1: [vert 1] [vert 2] ... ] [frame 2: [vert 1] [vert 2] ... ] ... [texture coordinates: [vert 1] [vert 2] ... ]

I guess the reasoning was, the texcoords are constant across all frames, so save memory by just storing them once. All of this is in one VBO. (actually all alias models, so the above pattern repeats multiple times. You just store the byte offsets of where the relevant parts of each alias model starts.)

--- For interpolating, is there a method to combine the middle of 2 frames or do I need to manually calculate the verts?

Other than doing it on the CPU, pretty much only GLSL. IIRC, MH mentioned an obscure vertex blending extension from 2001ish that nothing actually supports. There's ARB assembly, which I didn't feel like learning in 2014 ;-)
ericw
 
Posts: 92
Joined: Sat Jan 18, 2014 2:11 am

Re: Batching Verts?

Postby Baker » Thu Nov 06, 2014 10:58 pm

@ericw -- Thanks for explanations and pointing those changes out and the diff file.

I see your implementation has a fallback to fixed function pipeline for non-supporting hardware, very nice!

ericw wrote:not sure if you're working on Fitz Mark V or some other engine?

Trying to broaden my horizons so I can do more complex engine modifications.
The night is young. How else can I annoy the world before sunsrise? 8) Inquisitive minds want to know ! And if they don't -- well like that ever has stopped me before ..
User avatar
Baker
 
Posts: 3666
Joined: Tue Mar 14, 2006 5:15 am

Re: Batching Verts?

Postby mh » Fri Nov 07, 2014 10:55 am

I guess the reasoning was, the texcoords are constant across all frames, so save memory by just storing them once.


This is correct, yes.

I used glVertexAttribPointer calls rather than the old GL1.x style vertex arrays because I found glClientActiveTexture to be distasteful and overly-verbose, and I had (what seems with hindsight) a complex and messy state-filtering system on these calls (which was made simpler and cleaner with glVertexAttribPointer). But if you really wanted you could (ab)use extra texcoord arrays for the second set of positions and normals for frame interpolation.

At this stage I'd just second Spike's advice to dump GL 1.x and dump concerns about fixed-pipeline hardware. It's hadware that no longer exists, and even if a few holdouts are still using it, do you really want to compromise performance by continuing to support them?
We had the power, we had the space, we had a sense of time and place
We knew the words, we knew the score, we knew what we were fighting for
User avatar
mh
 
Posts: 2292
Joined: Sat Jan 12, 2008 1:38 am

Re: Batching Verts?

Postby mh » Fri Nov 07, 2014 4:47 pm

Just adding that with glVertexAttribPointer you can set up MDL positions as unnormalized bytes and save even more memory. You'll need to use 4 bytes for position (otherwise you'll probably be punted to a software fallback, so saving 1 byte per vertex will drop you to half speed or less) so make sure that you set w to 1.

On the subject of MDLs, here's another way of getting a bounding-box:
Code: Select all
for (int i = 0; i < 3; i++)
{
   mins[i] = hdr->scale_origin[i];
   maxs[i] = mins[i] + hdr->scale[i] * 255;
}

This is just the reverse of the position scaling (and scale/scale_origin calculation) from modelgen.c and is the same method used by Quake 2 (which had scale/scale_origin per-frame rather than for the entire model).
We had the power, we had the space, we had a sense of time and place
We knew the words, we knew the score, we knew what we were fighting for
User avatar
mh
 
Posts: 2292
Joined: Sat Jan 12, 2008 1:38 am

Re: Batching Verts?

Postby revelator » Sat Nov 08, 2014 9:40 pm

Sorry to drop in, at the darkmod forums we are pondering if it would be feasable to do batch processing in Doom3 ?
and since this seems related (vanilla still uses the same opengl 1.1 vertex array calls) i thought
i could maybe learn a few things about this, atm darkmod is hammering hard on the limits and suffers from considerable fps loss because its not optimized for todays gfx cards and uses cpu a lot.
Productivity is a state of mind.
User avatar
revelator
 
Posts: 2542
Joined: Thu Jan 24, 2008 12:04 pm
Location: inside tha debugger

Re: Batching Verts?

Postby mh » Sat Nov 08, 2014 10:57 pm

AFAIK Doom 3 already batches as much as is possible. The problem with a realtime lit and shadowed forward renderer is that the draw calls needed increase exponentially. In Quake terms, if you had a scene with 30 entities and 60 lights, that's a minimum of 30*60 draws needed for each of the light and shadow passes.

Going to a deferred renderer can cut down the draw call cost of the light passes (instead of 30*60 it's now just 60) but at the cost of huge bandwidth and memory overheads (you may not come out on the right side of the tradeoff) and is a non-trivial change to make. It can do nothing about the shadow pass draw calls.

What might help without being too invasive is to move the tangent vectors calculation from the CPU to the GPU. I've no idea if Doom 3 does these every frame or one-time-only, but even in the latter case it will reduce VBO sizes and array specification at the cost of some extra ALU ops, which is a similar kind of change to what was done for BFG edition elsewhere. Here's some GLSL I found floating around that claims to do the job, but I haven't tested it so assume that it comes with the appropriate health warning (no, you can't do this with ASM shaders because ASM shaders don't support the derivative (dFdx/dFdy) instructions):
Code: Select all
mat3 cotangent_frame( vec3 N, vec3 p, vec2 uv )
{
    // get edge vectors of the pixel triangle
    vec3 dp1 = dFdx( p );
    vec3 dp2 = dFdy( p );
    vec2 duv1 = dFdx( uv );
    vec2 duv2 = dFdy( uv );

    // solve the linear system
    vec3 dp2perp = cross( dp2, N );
    vec3 dp1perp = cross( N, dp1 );
    vec3 T = dp2perp * duv1.x + dp1perp * duv2.x;
    vec3 B = dp2perp * duv1.y + dp1perp * duv2.y;

    // construct a scale-invariant frame
    float invmax = inversesqrt( max( dot(T,T), dot(B,B) ) );
    return mat3( T * invmax, B * invmax, N );
}
We had the power, we had the space, we had a sense of time and place
We knew the words, we knew the score, we knew what we were fighting for
User avatar
mh
 
Posts: 2292
Joined: Sat Jan 12, 2008 1:38 am

Re: Batching Verts?

Postby revelator » Sun Nov 09, 2014 12:35 am

ack such a nice fix and then you cannot do it with ASM :S thats really unfortunate as we pretty much cracked the problem with accessing the depthbuffer and have a working implementation,
but atm its for ARB ASM shaders so it seems we have to go the GLSL way then if i understand correctly ?. Damn this means we are pretty much stuck with the option of porting darkmod to the BFG source or
rewrite vanilla completely to use GLSL. If you have other ideas for optimizing vanilla we are all ears.
Productivity is a state of mind.
User avatar
revelator
 
Posts: 2542
Joined: Thu Jan 24, 2008 12:04 pm
Location: inside tha debugger

Re: Batching Verts?

Postby revelator » Sun Nov 09, 2014 2:04 pm

Seems the darkmod devs disagree that vanilla would not benefit from batching,
atleast not there source which still uses code from before the source got free and could use some love :).
id say it could be worth it just for updating the source for more modern opengl features to use vertex attribs and at a time i had pretty much replaced
the old opengl 1.1 vertex arrays with vertex attribs. Unfortunatly this feature was from that infamous engine o mine that broke on AMD cards and im a bit scared to
move it to my new codebase in case this was what broke it, so this time ill be sure to have a backup in case things go south.
Productivity is a state of mind.
User avatar
revelator
 
Posts: 2542
Joined: Thu Jan 24, 2008 12:04 pm
Location: inside tha debugger

Re: Batching Verts?

Postby nbohr1more » Thu Nov 20, 2014 9:55 pm

@ mh

Yeah, I think we're speaking on different terms here because the brush geometry in Doom 3 is poorly batched unless
done by the mapper (proper func_static setups, etc ) We've got game code that automates some of that process but
it could be made more efficient by tighter integration into the renderer. That said, I saw that you were thinking
about "Transform Feedback" for your VBO implementation and I guess that may be a good general way to solve
a few of these "instancing" type scenarios other than just giving up the ghost and using hardware instancing with GLSL.

One thing I was hoping you might shed light on as a general notion:

How close would you say is Doom 3's "Interaction Table to VBO" setup to a Forward+ renderer's "Light Index to G-Buffer".
I get the impression that BFG performs on par to a Forward+ renderer with regard to light count so perhaps we've already
got an analogous system (Tr3b at least claims his Forward+ vanilla Doom 3 version had the same performance profile as BFG).

Is it worth considering or feasible to change this part of the renderer?
nbohr1more
 
Posts: 54
Joined: Fri Dec 09, 2011 7:04 am

Re: Batching Verts?

Postby rec » Sat Sep 19, 2015 7:18 pm

Hello to all,

I was wondering what is the difference between q3 and q1 bsp format, in terms of storing face vertices?
I made a very simple q3 bsp viewer in OpenGL 3.3, currently supports faces and textures.
I store each face in a separate VBO, and i have one huge index array for all the faces.
My intention is to continue as q1 bsp viewer, in the sources I saw that q1 bsp format stores edges instead of vertices, so im little bit confused.
Also, you cant have more than one texture per VBO, I was planning to separate the VBO by texture, but then you cant use vis culling.
rec
 
Posts: 4
Joined: Mon Nov 10, 2014 2:56 pm

Next

Return to Engine Programming

Who is online

Users browsing this forum: No registered users and 1 guest