MH's Direct3D 8.1 Wrapper

Discuss programming topics for the various GPL'd game engine sources.
Baker
Posts: 3666
Joined: Tue Mar 14, 2006 5:15 am

MH's Direct3D 8.1 Wrapper

Post by Baker »

Back in December 2009, MH released a Direct3D 8.1 that is easy to implement for most GLQuake engines (Example engines [6] site). Note: including an experimental EZQuake and my engine there are 8 working prototype engines using the wrapper.

[FitzQuake, Enhanced GLQuake, FuhQuake, ZQuake, Tomaz are the others.]

MH gave me and updated version of the wrapper lately from April and it is quite a bit faster.

I've been spending a bit of time lately sorting through things trying to isolate necessary and unnecessary changes in my own engine to optimize it and debug it.

Anyway, here is what is probably my final list of #ifdefs for the wrapper. I try to use detail that explains the exception ...

Code: Select all

#ifdef DX8QUAKE
# define DX8QUAKE_NO_DIALOGS               // No "starting Quake type "dialogs for DX8QUAKE
# define DX8QUAKE_NO_8BIT                  // D3D8 wrapper didn't keep the 8bit support
# define DX8QUAKE_GET_GL_MAX_SIZE          // D3D8 wrapper obtains the maxsize from the video card
# define DX8QUAKE_CANNOT_DETECT_FULLSCREEN_BY_MODESTATE	// Detecting modestate == MS_FULLDIB isn't useful
# define DX8QUAKE_NO_BINDTEXFUNC           // SGIS/ancient GL pathway removal
# define DX8QUAKE_NO_GL_ZTRICK             // DX8QUAKE hates gl_ztrick; clear the buffers every time
# define DX8QUAKE_GL_READPIXELS_NO_RGBA    // Wrapper only supports GL_RGBA; not GL_RGBA like envmap command uses
#endif
Most of the things like lack of "ztrick" support, the inability to do glReadPixels as GL_RGBA or compatibility with ancient obsoleted BindTexFunc or lack of 8 bit support or the "starting Quake" dialog wouldn't interest anyone anyway.

I have 2 or 3 bugs I'm battling not related to the Direct3D 8.1 wrapper but I am getting very high frames per second.

After ridding the 2 or 3 bugs that annoying me, the dx8 version of my engine is near enough in performance to GL as to be able to replace its usage and the reliability is top notch.

[Part of me is wondering what it'd be like to use the wrapper on a non-Quake engine that uses OpenGL 1.2 or thereabouts.]
The night is young. How else can I annoy the world before sunsrise? 8) Inquisitive minds want to know ! And if they don't -- well like that ever has stopped me before ..
Baker
Posts: 3666
Joined: Tue Mar 14, 2006 5:15 am

Post by Baker »

Btw ... how complicated is it to implement the wrapper in an OpenGL 1.2 type of engine (like not DarkPlaces).

Here are the changes in JoeQuake:

1. Put #ifdef/#endif as indicated around this code in sys_win.c to get rid of the "Starting Quake" type of dialog or in this case the popup JoeQuake logo ...

Code: Select all

#ifdef USEFAKEGL
	if (!(isDedicated = COM_CheckParm("-dedicated")))
	{
		hwnd_dialog = CreateDialog (hInstance, MAKEINTRESOURCE(IDD_DIALOG1), NULL, NULL);

		if (hwnd_dialog)
		{
			if (GetWindowRect(hwnd_dialog, &rect))
			{
				if (rect.left > (rect.top * 2))
				{
					SetWindowPos (hwnd_dialog, 0,
						(rect.left / 2) - ((rect.right - rect.left) / 2),
						rect.top, 0, 0,
						SWP_NOZORDER | SWP_NOSIZE);
				}
			}

			ShowWindow (hwnd_dialog, SW_SHOWDEFAULT);
			UpdateWindow (hwnd_dialog);
			SetForegroundWindow (hwnd_dialog);
		}
	}#endif
2. Very important ... in vid_wgl.c (gl_vidnt.c in some engines) ...

Do this ...

Code: Select all

#ifdef USEFAKEGL
		FakeSwapBuffers ();
#else
		SwapBuffers (maindc);
#endif
3. In quakedef.h near the very top like after #include <windows.h> add this ...

Code: Select all

// we're using the fake gl wrapper
#define USEFAKEGL

// FAKEGL - switch include files
#ifndef USEFAKEGL
#include <gl/gl.h>
#include <GL/glu.h>
// switch your OpenGL libs to this so that you can more easily switch between configurations
#pragma comment (lib, "opengl32.lib")
#pragma comment (lib, "glu32.lib")
#else
#include "fakegl.h"
#endif
Then add the wrapper (gl_fakegl.c) to the project. Then probably remove opengl32.lib and glu32.lib from references in the project as the immediately above code will add them if needed.
The night is young. How else can I annoy the world before sunsrise? 8) Inquisitive minds want to know ! And if they don't -- well like that ever has stopped me before ..
mh
Posts: 2292
Joined: Sat Jan 12, 2008 1:38 am

Post by mh »

Baker wrote:Btw ... how complicated is it to implement the wrapper in an OpenGL 1.2 type of engine (like not DarkPlaces).
It depends on how the engine handles it's OpenGL initialization really. Something like Quake II won't work at all without heavy rewriting of the engine, as it does a LoadLibrary on opengl32.dll, then GetProcAddress on all entry points, including the 1.0 and 1.1 entry points.

In theory the wrapper could be compiled into an opengl32.dll which could then be dropped into your game directory. It should work in just about all cases then, although it would need to be viewed as being more like the old 3DFX mini GLs where only the needed subset of full OpenGL was implemented.
We had the power, we had the space, we had a sense of time and place
We knew the words, we knew the score, we knew what we were fighting for
Spike
Posts: 2914
Joined: Fri Nov 05, 2004 3:12 am
Location: UK
Contact:

Post by Spike »

SwapBuffers is not actually part of opengl32.dll
You'd need to replace more dlls than just the opengl-specific ones.

You can rewrite the program's import/export address tables at run time, of course...
mh
Posts: 2292
Joined: Sat Jan 12, 2008 1:38 am

Post by mh »

SwapBuffers actually calls into wglSwapBuffers if you're running an OpenGL context so it's cool. Likewise all of the other GDI functions end up calling wgl versions.

It's very interesting if you put a dummy DLL called opengl32.dll into your game directory, monitor GetProcAddress calls, and watch what happens behind the scenes. You can dump the exports table from MS's opengl32.dll, implement all of the required functions as stubs in your own DLL, then run it in a debugger with breakpoints set so that you can get a good idea of how everything links up together.

Incidentally, the only reason that the "Starting Quake..." dialog was removed was so that it would compile clean with VS 2008. It can still be used otherwise if you want.

This is a valid replacement winquake.rc that will also work with VS 2008:

Code: Select all

#include "resource.h"
#include <windows.h>

#ifndef IDC_STATIC
#define IDC_STATIC (-1)
#endif

IDI_ICON2               ICON    DISCARDABLE     "quake.ico"

IDD_DIALOG1 DIALOGEX 0, 0, 62, 21
STYLE DS_MODALFRAME | DS_SETFOREGROUND | DS_3DLOOK | DS_CENTER | WS_POPUP
EXSTYLE WS_EX_TOOLWINDOW | WS_EX_CLIENTEDGE
FONT 16, "Times New Roman", 0, 0, 0x1
BEGIN
    CTEXT           "Starting Quake...",IDC_STATIC,4,6,54,8
END
We had the power, we had the space, we had a sense of time and place
We knew the words, we knew the score, we knew what we were fighting for
mh
Posts: 2292
Joined: Sat Jan 12, 2008 1:38 am

Post by mh »

A note on performance and other characteristics of the wrapper

Direct3D is very sensitive to primitive batching. Previous versions of this wrapper (pre April 2010) didn't attempt to batch at all, and performance suffered as a result. In April 2010 I rewrote it to use batching everywhere, detecting state changes as they happen and beginning a new batch on every state change.

All batches use indexed primitives with 16-bit indexes and the triangle list primitive type, except for if the requested OpenGL primitive type is GL_TRIANGLES, in which case it will use an unindexed triangle list. Strips and fans don't exist in it at all.

In many cases this will get performance that comes close to (or occasionally exceeds) OpenGL (which seems to batch up polygons internally within the driver), but it's dependent on the data that it's fed. If it's given polygons that are not grouped by state (the worst offender is R_DrawSequentialPoly or anything similar to/derived from it) it won't be able to batch them, and we get performance drop off.

A further optimization would be to detect if hardware T&L is available and use a dynamic vertex buffer if so. This is the approach used by DirectQ, which batches very agressively, and is able to draw the entire first scene of start.bsp, including the status bar, in 24 draw calls. It may not be viable for use with the wrapper as feeding a dynamic vertex buffer with polygons that aren't grouped by state will slow things down further. Some reworking of the surface refresh (especially getting rid of R_DrawSequentialPoly - a thing of true evil - and grouping bmodel surfs by texture) would be definitely required before it would be worthwhile doing this. Alias models and everything else are probably OK to leave as they are, but GLQuake's default surface refresh (especially the multitexture path) is embarrassingly suboptimal, even for OpenGL.

The wrapper supports a subset of baseline OpenGL 1.1 (similar to the old 3DFX mini GL) with some extensions available. Multitexture is exported, but combine modes are not. This isn't because D3D can't do combine modes (it can, and it's API is considerably more sensible than OpenGL here) but rather because the OpenGL API introduced a heavy layer of complexity in implementing them which caused other things to break.

It doesn't attempt to do much in the way of parameter validation, assuming that your OpenGL code is at least reasonably correct to begin with. I'm not sure if this was a wise decision but it certainly helped to speed up the development process. It does mean that you should test and confirm correct operation using native OpenGL rather than relying on this, however.
We had the power, we had the space, we had a sense of time and place
We knew the words, we knew the score, we knew what we were fighting for
Baker
Posts: 3666
Joined: Tue Mar 14, 2006 5:15 am

Post by Baker »

With any luck in the next 48 hours, I should have the OpenGL and the Direct3D8.1 wrapper in a single build. Sure code size suffers a bit.

At first it'll be command line controlled just because I haven't finished all my video/opengl shutdown + restart stuff (but that's damn close too).

I will say the code comments in the wrapper are extraordinary.

Oddly enough, when all is said and done, OpenGL probably isn't going to be the default renderer. Not because of performance, but because it is more common to have OpenGL driver issues than Direct3D issues as far as I can tell based on forum posts.

The Direct3D wrapper maybe gets 15%-20% lower FPS, but when you are getting 200 FPS on an older video card who cares whether you get 200 or 250 ... it makes little difference.
The night is young. How else can I annoy the world before sunsrise? 8) Inquisitive minds want to know ! And if they don't -- well like that ever has stopped me before ..
mh
Posts: 2292
Joined: Sat Jan 12, 2008 1:38 am

Post by mh »

Some more notes for you.

glReadBuffer and glDrawBuffer to anything other than GL_BACK just don't exist in D3D. envmap, timerefresh and Draw_BeginDisc will all need to be either removed or replaced.

Fog.

Fog is a doozy. Because D3D is closer to the hardware than OpenGL, and because the only thing it can software emulate is the per-vertex pipeline, and because even then you have to explicitly request software emulation of it, standard glFog calls in D3D on Shader Model 3 or above hardware will not work. This hardware is no longer required to support the old fixed functionality fog, and D3D will shove this in your face. The only solution is to get writing some shaders.

D3D8 vs D3D9.

It takes about half an hour of work to upgrade that wrapper to D3D9. I'd personally recommend it for a number of reasons, with the primary one being that D3D9 is most likely going to be flat-out faster and more reliable on most hardware. D3D8 was a short-lived version, but 9 has been around for almost a decade. Hardware vendors and driver writers know their D3D9, it's had several (more than 20?) newer improved iterations released by MS, and it's been optimized until it's screamed for mercy. It's as near to a "standard implementation" as you're going to find in D3D land. (You'll also find it easier to write those shaders if you want to address the fog problem.)

The limited subset of D3D I used will still work perfectly fine on older downlevel hardware.
We had the power, we had the space, we had a sense of time and place
We knew the words, we knew the score, we knew what we were fighting for
Baker
Posts: 3666
Joined: Tue Mar 14, 2006 5:15 am

Post by Baker »

Just a silly note, I noticed the wrapper doesn't support GL_MAX_TEXTURE_UNITS_ARB for glGetIntegerv and returns 0.

In DirectJoe the value is effectively returned as 666 which is why DirectJoe's multitexture is on.

Code: Select all

void CheckMultiTextureExtensions (void)
{
.
.
	glGetIntegerv (GL_MAX_TEXTURE_UNITS_ARB, &gl_textureunits);
	gl_textureunits = min(gl_textureunits, 4);

	if (COM_CheckParm("-maxtmu2") || !strcmp(gl_vendor, "ATI Technologies Inc."))
		gl_textureunits = min(gl_textureunits, 2);

	if (gl_textureunits < 2)
		gl_mtexable = false;

	if (!gl_mtexable)
		gl_textureunits = 1;
	else
		Con_Printf ("Enabled %i texture units on hardware\n", gl_textureunits);
}

Code: Select all

void glGetIntegerv (GLenum pname, GLint *params)
{
	// here we only bother getting the values that glquake uses
	switch (pname)
	{
	case GL_MAX_TEXTURE_SIZE:
		// D3D allows both to be different so return the lowest
		params[0] = (d3d_Caps.MaxTextureWidth > d3d_Caps.MaxTextureHeight ? d3d_Caps.MaxTextureHeight : d3d_Caps.MaxTextureWidth);
		break;

	case GL_VIEWPORT:
		params[0] = d3d_Viewport.X;
		params[1] = d3d_Viewport.Y;
		params[2] = d3d_Viewport.Width;
		params[3] = d3d_Viewport.Height;
		break;

	default:
		params[0] = 666;
		return;
	}
}
Sneaky.

I was trying to figure out why the wrapped version of my engine was so slow. Added some more checks and discovered multitexture was off and then dug through to figure out why.

Code: Select all

// opengl specified up to 32 TMUs, D3D only allows up to 8 stages, in the case of Quake we only use 2
#define D3D_MAX_TMUS	2
Ok ... so I should set this to 2 because that is what the wrapper supports. Hmmmm ... yeah GLQuake only uses 2 texture units but JoeQuake uses 3 texture units maximum.

Add: Aw shit ...

Code: Select all

// opengl specified up to 32 TMUs, D3D only allows up to 8 stages
#define D3D_MAX_TMUS	8
That is modified in the JoeQuake version of the wrapper too.

Time to WinMerge the Enhanced FitzQuake wrapper from April 2010 and the wrapper version with DirectJoe December 2009.

I guess I wasn't aware that some of the wrapped engines had modified wrappers customized to the engine.

Well ... now I do :D Still, solving your own problems = experience ++. :D
The night is young. How else can I annoy the world before sunsrise? 8) Inquisitive minds want to know ! And if they don't -- well like that ever has stopped me before ..
mh
Posts: 2292
Joined: Sat Jan 12, 2008 1:38 am

Post by mh »

Most likely explanation for the differences is that I was still evolving it at the time I was doing those ports. Not sure why I missed that particular item; perhaps because most Quake engines just used two TMUs and either checked for the extension or the entry points rather than the number of TMUs available (I don't think any engine I looked at does that).

Anyway, taking the lowest of d3d_Caps.MaxTextureBlendStages and d3d_Caps.MaxSimultaneousTextures should be good enough to get you there.
We had the power, we had the space, we had a sense of time and place
We knew the words, we knew the score, we knew what we were fighting for
Baker
Posts: 3666
Joined: Tue Mar 14, 2006 5:15 am

Post by Baker »

You know, this is just me being forgetful. You were continually modifying the wrapper as each engine got wrapped. But I wasn't thinking of that.

Either way, I just want equivalent engine performance or better against any of the other wrapped engines :D

My version of the wrapper is already milded modified for some of the things you've mentioned from time to time, like vsync, full screen windowed mode and a bit of the mode change stuff has been removed from the wrapped and pulled into the main engine code.
The night is young. How else can I annoy the world before sunsrise? 8) Inquisitive minds want to know ! And if they don't -- well like that ever has stopped me before ..
Spike
Posts: 2914
Joined: Fri Nov 05, 2004 3:12 am
Location: UK
Contact:

Post by Spike »

Baker wrote:The Direct3D wrapper maybe gets 15%-20% lower FPS, but when you are getting 200 FPS on an older video card who cares whether you get 200 or 250 ... it makes little difference.
A large part of that is probably the overhead of glVertex calls, and its friends.

Regarding d3d8. Any system that supports d3d8 can support d3d9 apis too, and 99% of the time already does. d3d9 is backwards compatible with older hardware, you'll just get error return values with newer features. There's no reason to not use d3d9 instead, except for the time required to implement it and find a later version of the sdk.
mh
Posts: 2292
Joined: Sat Jan 12, 2008 1:38 am

Post by mh »

Yeah, the use of immediate mode was really not friendly for performance here. It's unfortunate as especially a brush model surface is a perfect candidate for going directly through DrawPrimitiveUP (or DrawPrimitive in a vertex buffer) which would completely remove the intermediate step and most likely make up a huge chunk of that missing performance.

Doing it properly needs batching to reduce draw calls which translates to a huge overhaul of the renderer that's probably beyond the scope of that wrapper.
We had the power, we had the space, we had a sense of time and place
We knew the words, we knew the score, we knew what we were fighting for
Baker
Posts: 3666
Joined: Tue Mar 14, 2006 5:15 am

Post by Baker »

I've got something funny and unexpected coming :D I'll start another thread with it.

/If I don't have it done with 4 hours or less, I'm gonna be a bit disappointed in myself.
The night is young. How else can I annoy the world before sunsrise? 8) Inquisitive minds want to know ! And if they don't -- well like that ever has stopped me before ..
Baker
Posts: 3666
Joined: Tue Mar 14, 2006 5:15 am

Post by Baker »

Aw shucks! Probably about 3 more hours to complete it (in part because I expanded the project definition slightly).

And that's gonna mean Wednesday night or Thurs morning. Still, it is very, very funny. And potentially useful.
The night is young. How else can I annoy the world before sunsrise? 8) Inquisitive minds want to know ! And if they don't -- well like that ever has stopped me before ..
Post Reply