Walking Through A Whole Rendering Frame

Discuss programming topics for the various GPL'd game engine sources.
Post Reply
Baker
Posts: 3666
Joined: Tue Mar 14, 2006 5:15 am

Walking Through A Whole Rendering Frame

Post by Baker »

Quake perspective (loosely, but rather accurately):

Before any of this happens:

Client Work Setup Before Rendering

1. Client Receives Entities From Server: Server sends the entities it thinks the client can see
2. Client Builds Entity List: Client takes that list of entities, adds to a "draw list" and updates their positions and angles (rotating items update spin here).
3. (Hopefully update the particles for entities here).

Before Rendering Frame

Do a whole ton of calculations before we can even begin to draw ...

1. Determine Screen Viewport for 3D: From viewsize and fov cvars.
2. Make Modifications To The View: For bobbing, damage flashes, kicks, rolling, punchangles, chase_active 1, death, stair stepping smoothing.
3. Determine Client Side Visibility: Will be used later for determining surfaces of the map to draw, what static entities are visible, dynamic lighting

(* A static entity is a non-interactive part of the world, the server sends it only once to a client. They cannot change even in QuakeC.)

4. From Visibility Info Determine Content Blend: Are we in water or slime or lava ... Determine that color for the screen.
5. Update Overall Blend: Do we have a damage blend or pickup blend from a previous frame still in effect? If so, add that in. Add powerup blend.

(* FitzQuake does its waterwarp effect here, messing with the effective fov a little in each direction.)

7. Calculate Frustum Planes: From field of view, origin and angles, will be used to ignore surfaces and entities that are behind us. (i.e. Not in our "cone of vision")

The "Action" Part Prior To Rendering (Uses above calcs)

8. Mark Every Map Surface We Want To Draw. Based on visibility. And using frustum culling.
9. Add Static Entities.
10. Build Draw Lists of Surfaces. "Texture chains" by texture name.

11. Update Dynamic Lighting For Visible Surfaces.
12. (Hopefully, upload any changed ligtmaps now instead of doing it as we are drawing :D )
13. (Hopefully, run through all the visible entities and do frame lerping calcs now if ent will show up in the frame and grab lighting too).

Rendering Initialization For Frame

14. Clear Color/Depth Buffer, Set Viewport, Set Projection Matrix (Frustum/Fov/Farclip), Set ModelView Matrix (viewer origin and angles).
15. Set Quake default 3D capabilities (fixed pipeline stuffs)

Rendering

16. Draw the Sky.
16. Draw the World (In Groups of Each Texture). World = the static and unchanging part of the map. (Not lifts, doors, entities of any kind).

(* Bind the first texture used, draw every surface we can see. Do the next texture. Repeat).

17. Draw the Entities.

(* GLQuake drew sprites last.)

18. Draw the Water. Water is saved for last because it could be translucent (r_wateralpha) and alpha blending doesn't mix well with non-alpha blending.
19. Draw Flash Blends. Bubble spheres of color, if gl_flashblend is 1.
20. Draw particles.

(* GLQuake updated the particles here while drawing them.)

21. Draw the View Model. (Your shotgun or axe).
22. Add Content Blend From #5: Paint the whole screen with a blended quad of the blend color.

Post Frame

23. Decay Dynamic View Blend: Reduce damage blend, bonus flash.
24. Decay Dynamic Lights. Reduce the amount of flash from a rocket being fired, explosion, etc.
The night is young. How else can I annoy the world before sunsrise? 8) Inquisitive minds want to know ! And if they don't -- well like that ever has stopped me before ..
taniwha
Posts: 401
Joined: Thu Jan 14, 2010 7:11 am
Contact:

Re: Walking Through A Whole Rendering Frame

Post by taniwha »

Yeah, seems about right, though QF does things a little differently (but only in the details). I once made a call tree for that, but I've long since lost it :(.

The client entity setup code (CL_RelinkEntitites) actually inserts the entities into the BSP tree so entities benefit from bsp culling.

Frame lerping is done per entity at render time because before then, it's unknown whether the entity will even be drawn (though I supposed it could be done after building the world display lists but before any rendering), but also because the GLSL renderer does the blending in the shader (yay, hw lerping :).

All viewport stuff (fov, size, etc) is cached as much as possible, but if things have changed, then it's done at about the same time.

QF's GL renderer is all over the place for the sky (depends on sky settings), but it's either sky-world+brush models or world+brush models-sky, then alias models then sprites then water. GLSL is world+brush-sky-alias-sprite-water. (actually, brush model order depends on texture, but both renderers put world and brush model surfaces into the display lists before drawing anything (for fog and avoiding texture thrash)).

Probably the biggest difference between QF and other engines is the insertion of all entities (except flags for qw ctf, but that has a FIXME on it) into the BSP tree using efrags. This even includes temporary entities (eg, beam fragments).

I want to insert dlights into the bsp tree, but I have yet to put much thought into how I want to make use of such information.
Leave others their otherness.
http://quakeforge.net/
mh
Posts: 2292
Joined: Sat Jan 12, 2008 1:38 am

Re: Walking Through A Whole Rendering Frame

Post by mh »

Brief summary for DirectQ (current as-yet unreleased codebase).

CL_UpdateClient reads from the server, lerps normal (not MOVETYPE_STEP) entities and brings the view-related stuff up to date.

CL_PrepEntitiesForRendering adds entities to the visedicts list, adds temp entities for lightning bolts, and entity effects are brought up to date (this can also spawn dynamic lights and/or particles).

A whole bunch of "BeginFrame" functions get called for various drawing subsystems - these alloc temp memory (which is semi-garbage-collected at the end of the frame) for lists and set other initial states and counts. Here we also calculate the MVP matrixes, the frustum (which is extracted directly from the matrixes rather than being calculated separately), colour shifts, whether or not we need to do a render-to-texture pass.

The world is built - this chains up textures for the world model, adds static entities, merges any non-instanced bmodels that don't move into the world, and re-updates the MVP matrix based on a new far-clipping plane which is dynamically calculated from the farthest visible surface. This also checks visibility (pretty standard R_MarkLeafs stuff).

Then a whole bunch of shader constants that aren't gonna change for the frame are written to a constant buffer and this gets bound to cbuffer slot 0; including - MVP matrix, fog settings, client time, etc.

The world gets drawn, iterating through the texture chains. 3 iterations are made; one for sky, one for solid and one for water. Any alpha water surfaces are skipped over and added to a separate alpha list.

Next a pass through the visedicts list is made and entities are added to their appropriate lists (MOVETYPE_STEP and frame lerping are done here, bboxes are calculated from frame and orientation) - MDLs go in an MDL list, sprites get added to the same "alpha objects" list as above, any left over brush models are drawn as they pass. Brush models use the exact same rendering functions (including texture chaining) as the world so they're fully capable of supporting sky, alpha water, everything. If translucency is needed individual surfaces are added to the alpha list rather than the entire model.

Because the world and brush models mostly share the same code, the full description is left to here. First, texture chains for the model are cleared and surfaces with dynamic lights are marked (R_PushDlights modified to accept any model); lights are transformed into local space for the model so that they work on moving brush models (and BSP models - yayy!) Texture chains are built and this building is the only thing that differs (R_RecursiveWorldNode versus a linear walk through the surfaces). As each surface passes through the chain builder it's lightmap data is updated if needed. When all chains have been built we get an early-out option if no surfaces were added. Otherwise lightmap textures get updated and everything is drawn using the 3 passes mentioned above.

MDLs are drawn. These get an initial sort by model and posenums and come from vertex buffers with hardware frame lerping and instancing (the sort helps reduce buffer switches and makes instancing actually usefully work). The instancing path is always used, even if there is only a single model with a given set of poses, to save on state changes and buffer switches. This could also sort on texture to further reduce state changes and increase instanced batch sizes, but previous testing showed that the overhead of the extra sort conditions eliminated any gains (in practice if a texture needs to change the model most likely needs to as well). Shadows are also drawn during this step.

Honourable exception for the player model which gets different shaders and textures in order to do colormapping on the GPU (I don't bother colormapping other model types - GLQuake didn't and I don't reckon it's something that's used that often - if mappers/modders are even aware of it) - player skin re-ups was the biggest bottleneck in bigass1 for me and the GPU-side colormapping eliminates it and is the secret behind how I can get 850fps in a timedemo.

Particles and coronas are brought up to date and added to the alpha list. Particles are added per-emitter rather than per individual particle.

The alpha list is sorted from back to front and drawn in order. MDLs and brush surfaces again use the exact same rendering functions as before - including texture chaining for surfs - but retaining the back to front order. Sprites are drawn using instancing always-on and with static vertex buffers generated at load-time, a dynamic per-instance buffer with one-vert per sprite. Particles and coronas mostly share the same code - the pixel shader is the only thing that's substantially different (and even then not much so) - and are drawn batched up and instanced (one vert per particle) with billboarding, velocity and gravity all happening on the GPU.

The view model is drawn; this uses some slightly different rendering to other MDLs but efforts have been made to share as much code as possible.

Finally, if any render-to-texture effects were needed (and this now includes the polyblend flash) then they are applied. If the polyblend flash can be merged into another effect (such as underwater warping) it's done so I get to save on fillrate.

Architecturally it's quite clean - there's a lot of code-sharing and reuse going on (something I really dislike about the Fitz codebase is the amount of code duplication throughout it's brush and MDL renderers - brush models in particular should use the same texture chain functions as the world, although that's made difficult by the way it does texture chains), and there's a high degree of flexibility and consistency. Much of it has been set up around two passes - an initial setup pass (which only happens once per-frame) and a second drawing pass (which can happen as many times as needed per-frame). On the negative side, there's a little more cross-talk between modules than I'm comfortable with (it continues to get better), and few too many walks through the visedicts list happen - it doesn't actually need such a list at all - so there's still some room for improvement.
We had the power, we had the space, we had a sense of time and place
We knew the words, we knew the score, we knew what we were fighting for
Spike
Posts: 2914
Joined: Fri Nov 05, 2004 3:12 am
Location: UK
Contact:

Re: Walking Through A Whole Rendering Frame

Post by Spike »

fte is a lot more convoluted...
Its written such that frames can recurse, for fbo/portal support.
RT lights are 'merely' an extra chunk of code between non-decal surface and decal surfaces, just a few additive blends over the scene lighting each light in turn.
The visedicts list is iterated once for each subscene. 'subscene' in this case is anything that needs individual culling (ie: for each light, culling to the light's radius). Each iteration of that list generates a batch list, which is what the backend uses directly.
basically the backend is fed batches/meshlists, and has to draw those using the shader specified in the batch.
if the shader is a mirror, it just calls back into the outer renderer to walk the bsp tree again from a new perspective, before continuing with the mirror's blend effect over the top.

fte's d3d renderer is just 4 extra files. its fed the same batches as the gl renderer, and the shaders are (almost) the same too. I say almost because they might contain some hlsl code instead of glsl.

if enabled, 'pretty water' is thus 'just' an extra recursive render for below the water with corrected pvs (yay, no watervis), one recursive fbo render which is a mirror, one 'ripplemap' fbo to which all 'ripple' shaders are drawn, and one glsl surface with those 3 images to draw it on the screen with a scrolling normalmap or 3 with ripples that distort the reflection/refractions.

r_shadows gives clipped decals instead of glquake flattened models. these are passed through using the same mechanism as csqc's drawpolygon builtins, which is also what particles use. They're drawn with some specific sort key, which results in them being drawn after any rtlights.
(clipped decals are actually fairly simple really, the hard part is in tracking them to enable them to fade out. shadows can easily be regenerated each frame, so fairly simple. decals for shadows don't have most of the issues that flattened models have)

timedemo bigass1 has the same sort of fps as demo3. with dlights on, bigass1 actually has a higher framerate.
for me, a large part of the slowness of bigass1 is the particles - pvs culling them would likely help a lot.

with quakeworld, ents are copied into the visedicts list, rather than pointers. the backend calls in to the frontend of the renderer to build batch lists from a certain viewpoint. each visedicts iteration correlates to a different culling situation (ie, each dlight requires ents to be culled separately). in this aproach, the visedicts list serves to both provide storage for temp entities, and an easy way to find everything that needs culling.

fte has an abstraction between geometry and the renderer, as well as the ability to sort/filter sets of geometry, and support for recursion. this makes it fairly simple to render that geometry in various different ways (allowing god rays+decals+portals+mirrors+ripples+etc). The core of the renderer has no idea what an mdl/md2/md3/iqm/zym/dpm/hlmdl/bsp/sprite/particle/decal/csqcpoly is, only that it has a mesh with a shader.
fte does have more overhead than directq, but it imho pays off in versitility.
mh
Posts: 2292
Joined: Sat Jan 12, 2008 1:38 am

Re: Walking Through A Whole Rendering Frame

Post by mh »

Particles are a bitch. :( It took me a long long time to be able to handle 1,000,000 at semi-playable framerates, and I've still got some nastiness in my vertex shader that needs cleaning out (branching based on whether the particle needs a colour ramp or sets it's colour directly). It's not that bad as particles come in batches and all particles in each batch will take the same side of the branch, so it's more a case of getting the final 5% out of it.

It's really interesting getting a window on the design goals and thought-processes here. If I've got this right, FTE is somewhat like a scene graph where each node is capable of spawning another brand new scene graph, so your renderer just needs the headnode and everything kicks off from there?

I'd been moving in the "everything is a mesh" direction a good while back, but decided to do a 180 and say "let's just have functions specialized to handle their own content types really well" instead. I still think that offloading the complexity from runtime to loadtime/setup is a really good design choice though, and where possible try to get as much of that as I can.
We had the power, we had the space, we had a sense of time and place
We knew the words, we knew the score, we knew what we were fighting for
frag.machine
Posts: 2126
Joined: Sat Nov 25, 2006 1:49 pm

Re: Walking Through A Whole Rendering Frame

Post by frag.machine »

Baker wrote:Rendering

16. Draw the Sky.
Even when is not visible ? Wouldn't be better to move this to the end of the rendering ? In Fitzquake at least I observed a considerable performance hit on open areas due to the default scrolling sky in really cheap hardware like my ancient Toshiba notebook, so ensuring it's never actually drawn where not required sounds like a good thing.
I know FrikaC made a cgi-bin version of the quakec interpreter once and wrote part of his website in QuakeC :) (LordHavoc)
mh
Posts: 2292
Joined: Sat Jan 12, 2008 1:38 am

Re: Walking Through A Whole Rendering Frame

Post by mh »

There are different ways of drawing sky. Fitz projects the scrolling sky warp onto a big box surrounding the map so it's more appropriate for drawing at the end of the scene. GLQuake projects (or at least makes a fairly hamfisted attempt at doing so, but goddam square roots not being able to be linearly interpolated ultimately defeats it - damn you laws of mathematics, damn you to hell!) it onto the original surface polys so it doesn't matter much - there's next to no overdraw anyway. A shader-based solution can do the same but with per-pixel accuracy so it gets next to no overdraw together with a rock-solid warp that doesn't suffer from trying to draw straight lines between curves.

The main difficulty is that both software Quake and GLQuake allow sky surfaces in brush models - including BSP models and moving brush models. In practice that's rarely if ever used, but it still needs to be supported. Fitting that in with any kind of non-shader-based sky in any kind of clean and efficient way is not easy.
We had the power, we had the space, we had a sense of time and place
We knew the words, we knew the score, we knew what we were fighting for
Spike
Posts: 2914
Joined: Fri Nov 05, 2004 3:12 am
Location: UK
Contact:

Re: Walking Through A Whole Rendering Frame

Post by Spike »

there's lots and lots of ways to draw sky.

1: (glquake) chop the sky into a grid. recalc the texture coords at each vertex each frame. ignore any non-perspective-correct errors, subdivide more if its too painful. depth is correct. no extra overdraw

2: (darkplaces) conceptualize the sky as a sphere. ignore the original mesh, and draw only the sky sphere.
depth is b0rked completely, but at least you can run q3 shaders on it properly. entire screen is redrawn.
distortion exists but is tied to view space and will not swim about while moving, so you don't notice it.

3: (zquake) butcher the skybox code and generate a sky in a grid based upon which parts of the screen had a part of the sky in them. make sure you don't subdivide in advance.
depth is b0rked completely, and q3 shaders are also b0rked.
slight overdraw, distortion exists but is tied to screen positions and will not swim about while moving, so you don't notice it.

4: (fte+directq I assume) use glsl to project the sky in a fragment program.
depth is correct, though q3 shaders need some sort of conversion
no extra overdraw.
no distortion at all.

5: draw the sky completely black.
r_fastsky. yay.
no overdraw. no distortion - nothing to be distorted.
for bonus points, make it black with glClear and skip drawing any and all sky surfaces completely. which is what q3 does in that case (q3 sky spheres are not ideal...).
depth is correct.


depth is important on dm3. if its not written properly, there are areas where you can see players through the sky.
you can always perform a depth-write-only pass after drawing the sky, so this is not fatal, just slower.

GLSL skies win every time though.

Here's the kicker though. All logic says 'draw sky first', but really, if you draw the sky last, with some weird projection that puts everything at some big distance from the view port, you can benefit from early-z and avoid drawing most of that sky in the first place.
When two opaque objects overlap, you always want the nearest drawn first so you can skip even drawing each pixel in the furthest object. So draw the solid world+models first, then the probably-further-away sky, then everything else.
Especially if its a skydome.



mh, with fte when walking the world, each surface stores its mesh pointer into the batch's list. the list has space only for each mesh twice, so there's a limit on the amount it can recurse (hall of mirrors would be deadly anyway). Otherwise, recursion is provided through the C stack.
you can push a batch with 1+ mesh and a callback function set, and the backend will call that callback in order to populate the mesh data. you might want to use that in case of code that builds into static meshes, but beware as rtlights can potentially call it a lot per frame. Sadly, I use it for models. :P
I ought to use it for particles, just because oriented particles are built from one viewpoint and don't look right when they're then built from a viewpoint on the other side of the waterline or whatever... This is a fairly annoying bug, that's made worse by the fact that the particle system doesn't really interact with the backend at all - its more part of the client rather than the renderer.
But yeah, GLR_DrawPortal can be used to recurse into the backend, drawing a scene within a seperate scene, just with one level of recursion. One level is plenty, and is enough for cheap effects like mirrors, portals, and water reflections (try r_waterstyle 4 on those cheap teleporter blocks on the rmq test maps - BLOCKS OF JELLY! its awesome).
I'm tempted to call in to csqc for things like ui menus+duke3d-security-cams, but I suspect the right thing to do there is to do it explicitly, so you can define your own framerate with it (and draw mirrors!).
Client just feeds visedicts, worldmap, dlights, and 'polys' (particles+shadows+csqc polys using some indexed array thing). The frontend generates batches/meshes from those, and the backend tries to draw it, with a few calls back into the frontend(recursion, mesh rebuilding) as needed, but doesn't (currently) need to involve the client.
If it runs out of temp memory, it'll just not draw that batch, and grow its temp memory at the start of the next frame (keeping track of how much was needed).
A large part of my motivations for this was in d3d+gl compat. You talk with the backend, not gl or d3d (other than textures anyway). As well as getting q3's portals to work which was actually pretty simple to implement in the end.
The backend manages all gl state, so there's no (well, less) calls to enable/disable stuff that's already enabled/disabled. All state is tracked in one place instead of 50...
Its QF that uses just the bsp headnode by linking new things in to it. :P
mh
Posts: 2292
Joined: Sat Jan 12, 2008 1:38 am

Re: Walking Through A Whole Rendering Frame

Post by mh »

One of the weaknesses of drawing sky as a big sphere/box/whatever is that you normally need to adjust your far clipping plane for it.

You can avoid that by setting glDepthRange (1, 1) and drawing it as a tiny sphere/box/whatever. Try it, it works.

There are other places in id1 maps where geometry that wouldn't otherwise be seen is visible with sky drawn this way. Most of the time it's a semi-acceptable glitch, but yeah, it's FAIL in multiplayer. It's obvious that the maps were designed to have sky as occluding geometry.

A workaround is to draw the sky surfaces normally but with glColorMask (0, 0, 0, 0), then invert the depth test and draw your sphere/box. That gets you depth working right at the cost of some extra fillrate. Sky needs to come first in the frame under this setup.

Shader based is definitely best. It's easy, it's fast and a whole lot of complexities and special cases go away. You can even do rotating skyboxes by just using a cubemap and rotating the texture matrix.

Q2's skybox is slightly different - you're supposed to see other geometry through that.
We had the power, we had the space, we had a sense of time and place
We knew the words, we knew the score, we knew what we were fighting for
Post Reply