Drawing brush models with the world

Baker · Post by **Baker** » Tue Jul 10, 2012 7:01 am

In Engine X, I have the option r_brushmodels_with_world and it will render .... (speeds up things maybe 1%, not significant).

1) World submodels (lifts, platforms) but not health boxes.
2) With origin x, y, z = 0
3) With angles x, y, z = 0
4) Provided it doesn't have alpha (solid surface)

What is a good plan to draw the world submodels where the angles and origin are modified?

This being OpenGL 1.2 +/- so fixed pipeline of course.

Am I better off using glRotatef and glTranslatef during the drawing of texture chains and then restoring the modelview matrix back when it runs into surfaces belonging to a world submodel? (This should get complex. I don't know if anyone ever instances world submodels or if that is even possible --- I bet the lighting would potentially look stupid. But really a surface doesn't have a way to know which entity it represents.) Maybe throw the alpha surfaces into the liquids texture chain.

But really, I'm thinking I shouldn't do this at all. Why draw the world submodels in the same function as the world unless it is easy? There is no specific benefit.

Part 2: I think I've heard of MH say he once threw the unchanging static part of the whole map into a vertex array. How does that work with a ton oi potential textures? I'm no master of vertex arrays, but I thought those got paired up with arrays of texture coordinates (not multiple textures). The lightmaps can change, but I guess that is really is a texture update.

Why the hell am I asking this? RMQ engine does this and it uses ancient OpenGL with a ton of extensions.

I could scrap this post, but maybe somehow someone might somehow benefit from reading it.

-----------------

I might add that the RMQ Engine occasionally stalls for a second every once in a great while for me on 2 different laptops (Windows 7 one, a Mac one). Maybe when a lot of dynamic lighting changes happen ??? <---- those question marks mean I do NOT really know why. Both are ATI.

I guess I think the rendering code really could be a lot simpler. And I'm thinking supporting non-multitexture in a desktop engine (or a mobile one --- even OpenGL ES 1.x is required to have multitexture) is something I'm not doing any more (why waste that time for computers that literally don't even exist. That one Steam hardware survey said 2 TMU is 100%)

Not a very coherent or well themed post. Drifting around a bit. I know. I'm thinking of what I hate about the rendering code and why it stands between me and what I am trying to get done.

taniwha · Post by **taniwha** » Tue Jul 10, 2012 8:17 am

I did exactly this (brush models with the world) in QF. While in theory sub-models can be instanced, I have yet to encounter an example. I also suspect it would break a vanilla engine. QF assumes sub-models are not instanced. However, even instanced models are drawn together with the world (this was not easy, but I feel it was worth the effort).

The original reason I made QF draw brush models with the world was so I could get fog working in single texture mode without a gob of duplicated code. One big benefit of doing so is better batching of lightmap updates (and we know how important that can be).

QF's glsl renderer throws all brush model vertices into the one giant VBO (QF's gl (fixed-function) renderer doesn't use VBOs). This is not a problem thanks to glDrawElements, and bsp's MAX_MAP_VERTS being 65536. It could be a problem for very big maps, but as there are very few instanced bsp model types, and they're all boxes, the map would have to have within a few dozen verts of 65536: pretty rare yet, I believe. Worst comes to worst, update the vertex array offsets (I wonder how expensive that gets).

OK, how to handle moving bsp models (instanced or non-instanced)...

The first thing I did when developing this code (for gl, I hadn't started glsl yet) was to not transform any entity in the renderer. Instead, I had the client calculate and store the transformation matrix in the entity, and then only when something (origin or angles) changes. This means that a static entity's transform is calculated only once for the duration of the level, rather than every frame. I then used glPushMatrix, glMultMatrixf (or glLoadMatrixf), and glPopMatrix whenever I needed to change transformations.

Now for the trick for getting this to work with drawing brush models with the world. I gave surfaces a transform pointer. Surfaces belonging to the world get a null pointer for the transform. Brush models provide a pointer to the entity's transform matrix for the surface's transform.

Now on to drawing... before running the surface chains, the model view matrix is setup for rendering the world. Then, while running through the surface chains, if the surface's transform pointer is null, nothing special is done. However, if the surface's transform pointer is not null, then the code does a glPushMatrix/glLoadMatrix before drawing the surface polys, then a glPopMatrix after drawing the polys. Since most surfaces will not be split (only water and sky), and sub-models in there "home" position are all untransformed, this is not the most efficient. However, instance model surfaces (probably more common) are always transformed, so the minor bit of inefficiency seems to be a reasonable trade-off for the simpler code.

The messy part of all this (for gl) was getting sky chains to work properly. I also had some issues with instanced models, but that might have been before I figured out why they were a PITA

.

I currently have a problem in the glsl renderer with certain surfaces of sub-models not getting the right transform, but I believe that to be a problem in my recent optimization run. Probably just messed up the uniform load logic somewhere.

One other benefit of getting the gl renderer to draw brush models with the world is it made creating the VBO for glsl much easier as I could abuse the surface chaining code to help me build the lists.

Anyway, unless you have a specific reason to do so (single texture fog, building VBOs...), I have to agree that drawing brush models with the world is more effort than it's worth (about 1% isn't much of a gain). However, moving the transform calcs out of the renderer is easy and worth it.

Here's the commit message for the entity transform patch (tweaked to look good in the forum).

commit 3eb859a88f1c05eb10a8a9e7d6b4f7418d95979a
Author: Bill Currie <bill@taniwha.org>
Date: Thu Dec 15 12:06:03 2011 +0900

Move the entity transform setup into the clients.

This has several benifits:

The silly issue with alias model pitches being backwards is kept out
of the renderer (it's a quakec thing: entites do their pitch
backwards, but originally, only alias models were rotated. Hipnotic
did brush entity rotations in the correct direction).

Angle to frame vector conversions are done only when the entity's
angles vector changes, rather than every frame. This avoids a lot of
unnecessary trig function calls.

Once transformed, an entity's frame vectors are always available.
However, the vectors are left handed rather than right handed (ie,
forward/left/up instead of forward/right/up): just a matter of
watching the sign. This avoids even more trig calls (flag models in
qw).

This paves the way for merging brush entity surface rendering with the
world model surface rendering (the actual goal of this patch).

This also paves the way for using quaternions to represent entity
orientation, as that would be a protocol change.

Spike · Post by **Spike** » Tue Jul 10, 2012 2:21 pm

glLoadMatrix should be faster than glPopMatrix. not by much though.
glLoadMatrix should be faster than glMultMatrix. much of that is because you can cache the modelview matrix with glLoadMatrix.

changing uniforms will hurt about as much as a texture change (actually less if you're running out of graphics ram... which you won't be).
so changing your matricies mid-batch will give no improvement, so make sure you're not randomly hopping between 500 different ents drawing only one surface at a time.

personally I hate the idea that a surface might be drawn differently purely because its a submodel.
polygonoffset is annoying, and fte's defaults indeed bugged out on at least one android device...
special cases like disabling culling or whatever is just ugly. I want a world entity with alpha 0.5!
Realistically though, you can skip a load of bells and whistles if you do special-case it, but if you're using multitexture in one place, you should have it in the other two. it can make maintainence annoying, which is the real issue. take caustics for instance. love them or hate them, if they don't affect bsp models but do affect world then you have some real weirdness going on.
the real kicker is that this applies to entities too.

older hardware supports only one tmu, but such hardware has eg limited blending modes and do not fully support opengl anyway. which makes then misbehave in everything recent, even if they have the number crunching power for an enjoyable game. really the only reason I would personally care is because a) limitations make things more fun. b) leileilol loves running stuff at low framerates and with dodgy blend modes.

mh · Post by mh » Tue Jul 10, 2012 5:06 pm

I gave up on single TMU cards a long long time ago - the limitations were just too much. Yeah it's sometimes fun to come up with creative solutions, but most of the time I found myself fighting against them and writing multiple versions of code that did the same thing. I'd much rather be productive.

For brush model entities I merge them into the world texture chains in the following conditions:
- Origin is 0|0|0.
- Angles is 0|0|0.
- The model is only being used by a single entity.
- The model name begins with '*'.
- The entity has no alpha.
- ent->frame is 0 (important otherwise you won't get alternate anims!!!).

In theory Quake allows '*' models to share surfaces with the world, and more than one entity to share the same '*' model - there's nothing in GLQuake to prevent it (don't know about software). In practice I don't think I've ever seen it happen, but that's not to say that there isn't a version of QBSP out there doing it.

The merge is just a standard surf->texturechain = surf->texinfo->texture->texturechain thing for each surf that will be drawn in the entity, and is run after R_RecursiveWorldNode but before the world and other brush model entities are drawn - if an entity was merged then it doesn't get drawn in the regular pass. In terms of performance it gives no measurable difference whatsoever in standard id1 maps (fluctuating conditions on your PC will have bigger impact) but may help with maps that use a lot of e.g. func_wall entities.

Beware that standard texture chaining (like the example I gave above) will give you back-to-front ordering, which is going to result in high overdraw. You need to either reverse the chain before drawing, or add new surfaces to the end of the chain instead of the front of it during building.

For standard brush model entities (that don't meet the above conditions) things are a bit more familiar. Load a matrix, loop through the surfaces, add them to texture chains (again based on ent->model), draw chains, clear chains when done, repeat for the next entity. If matrix loading is a bottleneck then you're doing something wrong...

There's potential value to be had in drawing brush models before the world rather than after it, as brush models are more likely to occlude world geometry than be occluded by it, and that will get you early-Z optimizations on most gfx cards (I got good results in the same manner by drawing the view model as the first item in the frame).

Any complexity in the RMQ engine is on account of it's FitzQuake heritage and it's reliance on ancient OpenGL. In the FitzQuake case the original code for handling brush models was (IMO and to be brutally honest) a quite horrendous mess - a correct horrendous mess perhaps, but a horrendous mess all the same. In the case of ancient OpenGL it imposes restrictions that mean you need to go through a coupla extra levels of sorting and other nastiness in order to get things batching up well. Overall there's an extremely high level of frustration involved in it. Code derived directly from GLQuake that does this, and that isn't afraid to jump the hardware requirements a little bit forward, can be a LOT simpler. DirectQ pushes all surfs through the same texture chain setup and drawing routines, bases texture chains on ent->model rather than cl.worldmodel (and takes ent as a param to it's drawing functions) so there's a high level of consistency between all surface types. It's main surface drawing function is 24 lines long, including braces, comments and whitespace.

Ideally you'd just put all brush model surfaces into a single big static VBO, build indexes dynamically for the surfaces you're actually going to draw, use a single glDrawElements (or equivalent) to draw each texture chain, and use shaders for animating water and sky. This also needs texture arrays to avoid texture changes (and consequent batch breaking) for lightmaps, glMapBufferRange to get decent dynamic VBO performance (for indexes), 32-bit indexes in hardware for large maps, and at least 3 TMUs for single-pass diffuse/lightmap/fullbright (in practice if you have the other requirements you won't have less than

. That's the current fastest path on any reasonably civilized hardware and is also much much cleaner and simpler in terms of code than any other approach.

Spike · Post by **Spike** » Tue Jul 10, 2012 8:17 pm

mh, I find it funny that you advocate using texture arrays for lightmaps, but not world surfaces.
using it for textures *and* lightmaps would mean that you can fully depth sort in advance. and bsps give nice depth sorting. yes, you'll still have issues if you somehow (bsp2?) have a map with >65k verts, but hey...

glMapBufferRange is a problematic one. mapping a buffer is quite a slow operation, with all sorts of system calls and syncronisation issues. index buffers are often software-based. If you're going to use it, you want to map a buffer for *all* batches rather than just one, to avoid making lots and lots of map/unmap calls.
For D3D, FTE maps buffers on a per-batch basis. DO NOT DO THIS, it really kills the D3D renderer's performance, and should have a similar result with gl (at least d3d has a proper 'I'm not gonna break anything' flag, paired with a 'I'm gonna break everything' flag for easy reuse, but even with those it still plummets). With gl, I generally get away with just calling glDrawRangeElements lots in a loop. I'm probably ought to try and investigate glMultiDrawElements, but from what I've seen its only the function call overhead that's saved, and that's just userland overhead (unlike with d3d where it would be syscall overhead).

Even if you do have over 65k verts, you can still use a single vbo. Just use different vbo offsets.
Some of the rmq maps exceed 65k verts with a single texture.

Regarding gles - you'll spend more time rewriting the input code to get the thing usable than you'll spend tweeking the renderer to be gles compatible. I wouldn't really worry about gles1. gles2 is more worthy of concern

Baker · Post by **Baker** » Tue Jul 10, 2012 9:35 pm

taniwha wrote:Here's the commit message for the entity transform patch (tweaked to look good in the forum).

An interesting read ...

Spike wrote:polygonoffset is annoying, and fte's defaults indeed bugged out on at least one android device...

Forgot about polygonoffset.

mh wrote:- ent->frame is 0 (important otherwise you won't get alternate anims!!!).

Didn't factor in ent->frame. I didn't have R_TextureAnimation in mind. Still, I bet that happens prior to draw anyway. I am re-organizing a bit and hadn't factored R_TextureAnimation into the equation ...

mh wrote:- The model name begins with '*'.

This effectively makes no difference (as far as I know), but I don't check the name. Doesn't get me anything special, but I dislike the idea of checking a model name to learn something about the model. Instead I use:

model->surfaces == cl.worldmodel->surfaces // If true, this entity is part of the world and not a healthbox, etc.

mh wrote:DirectQ pushes all surfs through the same texture chain setup and drawing routines, bases texture chains on ent->model rather than cl.worldmodel (and takes ent as a param to it's drawing functions) so there's a high level of consistency between all surface types.

Now that is something I know I can use

mh · Post by mh » Tue Jul 10, 2012 10:13 pm

The trouble with world surfaces is that not all the textures are the same size, and every texture in an array must be the same size. It also complicates things with the texture cache - that's not that big a deal, but the first one is a killer. Sure it would be possible to get something working, but it doesn't really seem worth the hassle. The really big benefit from using an array for lightmaps was being able to completely avoid a lot of the intermediate arrays and sorting steps I used to have to do - that involved chaining by texture, then pushing that out to a new set of chains by lightmap for each texture (which did have the benefit of reversing the chain for me), and I had to be careful about the order I built lightmaps in too. It was horrible.

glMapBufferRange has a nice GL_MAP_UNSYNCHRONIZED_BIT that won't cause pipeline stalls. A GL version of the way I set things up would look something like this:

Code: Select all

#define FIRST_INDEX_OFFSET (r_firstdrawindex * sizeof (unsigned int))
#define INDEX_RANGE_OFFSET (r_numdrawindexes * sizeof (unsigned int))
#define BUFFER_MAP_BITS (GL_MAP_WRITE_BIT | GL_MAP_UNSYNCHRONIZED_BIT | GL_MAP_INVALIDATE_RANGE_BIT)

unsigned int *ndx = NULL;

if (r_firstdrawindex + r_numdrawindexes >= r_maxdrawindexes)
{
	glBufferData (GL_ELEMENT_ARRAY_BUFFER, r_maxdrawindexes * sizeof (unsigned int), NULL, GL_STREAM_DRAW);
	r_firstdrawindex = 0;
}

if ((ndx = glMapBufferRange (GL_ELEMENT_ARRAY_BUFFER, FIRST_INDEX_OFFSET, INDEX_RANGE_OFFSET, BUFFER_MAP_BITS)) != NULL)
{
	int i;
	int drefirstvert = 0x7fffffff;
	int drelastvert = 0;

	// reversing the draw order so we get front-to-back
	for (i = r_numdrawsurfaces - 1; i >= 0; i--)
	{
		int v;
		msurface_t *surf = r_drawsurfaces[i];

		for (v = 2; v < surf->numedges; v++, ndx += 3)
		{
			ndx[0] = surf->firstvertex;
			ndx[1] = surf->firstvertex + v - 1;
			ndx[2] = surf->firstvertex + v;
		}

		if (surf->firstvertex < drefirstvert) drefirstvert = surf->firstvertex;
		if (surf->lastvertex > drelastvert) drelastvert = surf->lastvertex;
	}

	glUnmapBuffer (GL_ELEMENT_ARRAY_BUFFER);

	glDrawRangeElements (GL_TRIANGLES,
					drefirstvert,
					drelastvert,
					r_numdrawindexes,
					GL_UNSIGNED_INT,
					(void *) FIRST_INDEX_OFFSET);

	r_firstdrawindex += r_numdrawindexes;
	r_numdrawsurfaces = 0;
}

That actually runs faster on id1 content than DirectQ currently does - I'm putting it down to D3D11 requiring 32-bit calcs throughout the pipeline, whereas GL can still optimize down to lower precision if the driver thinks it's worthwhile to do so.

Internestingly, D3D11 has no versions of the *Range* functions - it's just Map and DrawIndexed - with the explanation given appearing to be something like "most drivers used to ignore it anyway so we just didn't bother". I'm reasonably certain that glDrawRangeElements is ignored for hardware T&L and you may as well just use glDrawElements - it definitely applies to D3D9's equivalent anyway (you can put any old rubbish into the equivalent params and it still works with no speed hit).

I've also (accidentally) overshot the range when Locking a vertex buffer in D3D9 before and with no ill effects, but it's not something I think I'd like to rely on. I'd also feel a tad more comfortable if D3D11 had range specifiers in the params, but then again I can Map in advance, check for overflow as each primitive is added, flush and re-Map if needed, without having to worry about calculating too much in advance - very handy for 2D where before I used to have to go through some awful intermediate arrays.

For Mapping I use a sneaky trick recommended by NVIDIA - if a buffer will overflow and more than 6 frames have passed since it was last at 0, I just go back to 0 without bothering to discard. That avoids new allocations and still doesn't stall as the buffer has already been fully through the pipeline. I picked 6 because while the driver will give 3 by default, the extra 3 adds some headroom.

In D3D11 I Map on a per-batch basis and have no ill-effects (each texture chain is a batch); with 9 I used to tot up everything needed for the current entity and Lock the total. Batches tend to be small for BSP models, bigger for inline, biggest for the world, but D3D11 has significantly less draw call overhead than 9 did (it's roughly the same as GL now) so it's not that big a deal.

I understand that MultiDrawElements is just implemented in software by the driver and is equivalent to looping through the params. I've tried it before and definitely never seen any benefit from it, although I guess there's no harm in a second opinion.

taniwha · Post by **taniwha** » Tue Jul 10, 2012 10:16 pm

Uh, I thought brush models don't support frames.

mh · Post by mh » Tue Jul 10, 2012 10:26 pm

Baker wrote:
mh wrote:- ent->frame is 0 (important otherwise you won't get alternate anims!!!).
Didn't factor in ent->frame. I didn't have R_TextureAnimation in mind. Still, I bet that happens prior to draw anyway. I am re-organizing a bit and hadn't factored R_TextureAnimation into the equation ...

ent->frame is actually very important because if you merge the surfaces into the worldmodel chains then they're going to also pick up frame 0 from the world model. Unless, like you say, you call R_TextureAnimation before chaining the surfaces, and chain based on the animated texture (GLQuake doesn't, other engines might). I'd double-check if I was you.

Baker wrote:
mh wrote:- The model name begins with '*'.
This effectively makes no difference (as far as I know), but I don't check the name. Doesn't get me anything special, but I dislike the idea of checking a model name to learn something about the model. Instead I use:

model->surfaces == cl.worldmodel->surfaces // If true, this entity is part of the world and not a healthbox, etc.

I'm not a big fan of using the model name method either but in this case it seems safe enough as use of '*' is all over the Quake code - in the setup for localmodels, for instance. I actually added a "subtype" enum to my model struct recently (mod_world, mod_inline, mod_bsp) recently so that would be better to use.

It actually does make a difference as only '*' models will have the same textures as the world - bsp models most likely won't, but a bsp model can be set up to satisfy the other conditions (unlikely, but still possible). Unless you check this you may merge a bsp model into the world, then miss it's surfs while looping through the world textures for drawing. That's if you're lucky. What's also likely to happen is that some subsequent frame a second entity using the same model comes into view and now you've got an infinite texture chain in it. Put it this way - it doesn't hurt and you get to avoid a rare crash bug.

mh · Post by mh » Tue Jul 10, 2012 10:27 pm

taniwha wrote:Uh, I thought brush models don't support frames.

Yeah they do - kind of. ent->frame doubles up as specifying alternate texture anims; if it's 0 you get regular, if it's 1 you get alternate.

Baker · Post by **Baker** » Tue Jul 10, 2012 10:50 pm

taniwha wrote:Uh, I thought brush models don't support frames.

Push a button, the texture changes. The entity frame changed from frame 0 to frame 1.

buttons.qc wrote: void() button_wait =
{
...
SUB_UseTargets();
self.frame = 1; // use alternate textures
};

void() button_return =
{
self.state = STATE_DOWN;
SUB_CalcMove (self.pos1, self.speed, button_done);
self.frame = 0; // use normal textures
if (self.health)
self.takedamage = DAMAGE_YES; // can be shot again
};

taniwha · Post by **taniwha** » Tue Jul 10, 2012 11:20 pm

mh wrote:Ideally you'd just put all brush model surfaces into a single big static VBO, build indexes dynamically for the surfaces you're actually going to draw, use a single glDrawElements (or equivalent) to draw each texture chain, and use shaders for animating water and sky. This also needs texture arrays to avoid texture changes (and consequent batch breaking) for lightmaps, glMapBufferRange to get decent dynamic VBO performance (for indexes), 32-bit indexes in hardware for large maps, and at least 3 TMUs for single-pass diffuse/lightmap/fullbright (in practice if you have the other requirements you won't have less than . That's the current fastest path on any reasonably civilized hardware and is also much much cleaner and simpler in terms of code than any other approach.

I haven't used texture arrays (hadn't heard of them until reading this thread), but that's pretty much what QF does in the GLSL renderer, and yes, it gives good results: GLSL wants to be faster than GL, but those occasional stalls when updating the lightmaps kill it. I haven't had time to look into why.

taniwha · Post by **taniwha** » Wed Jul 11, 2012 12:25 am

Baker: hmm, thanks, I'll have to do some more digging through the code.

...

ok, yeah, found it. Looks like it's a boolean value:

Code: Select all

    if (currententity->frame) {
        if (base->alternate_anims)
            base = base->alternate_anims;
    }

That will be why I wasn't aware of frame support in brush models: that's the only mention of frame for brush models.

I just checked, and whew, brush model frames do work properly with my scheme.

InsideQC Forums

Drawing brush models with the world

Drawing brush models with the world

Re: Drawing brush models with the world

Re: Drawing brush models with the world

Re: Drawing brush models with the world

Re: Drawing brush models with the world

Re: Drawing brush models with the world

Re: Drawing brush models with the world

Re: Drawing brush models with the world

Re: Drawing brush models with the world

Re: Drawing brush models with the world

Re: Drawing brush models with the world

Re: Drawing brush models with the world

Re: Drawing brush models with the world