Doom3 shadow optimization

Discuss programming topics for any language, any source base. If it is programming related but doesn't fit in one of the below categories, it goes here.
revelator
Posts: 2621
Joined: Thu Jan 24, 2008 12:04 pm
Location: inside tha debugger

Re: Doom3 shadow optimization

Post by revelator »

Just for fun i tried an old build of raynors engine with glsl interactions and the bug is gone when in glsl mode so its definatly some fuckup with ARB2 shaders on AMD.
Productivity is a state of mind.
revelator
Posts: 2621
Joined: Thu Jan 24, 2008 12:04 pm
Location: inside tha debugger

Re: Doom3 shadow optimization

Post by revelator »

Nailed it.

Its a mix of arb mapbufferrange acting up with my driver and me trying to fix shadows instead urgh.

but heres a lil goodie, i fixed the bad function for getting videoram from Doom3.

Turned out it works just fine besides returning negative values :lol:

Code: Select all

/*
================
Sys_GetVideoRam
returns in megabytes

This function works but returned negative sizes.
Fixed now.
================
*/
int Sys_GetVideoRam( void ) {
#ifdef	ID_DEDICATED
	return 0;
#else
	int retSize = 64;

	CComPtr<IWbemLocator> spLoc = NULL;
	HRESULT hr = CoCreateInstance( CLSID_WbemLocator, 0, CLSCTX_SERVER, IID_IWbemLocator, ( LPVOID * ) &spLoc );
	if ( hr != S_OK || spLoc == NULL ) {
		return retSize;
	}

	CComBSTR bstrNamespace( _T( "\\\\.\\root\\CIMV2" ) );
	CComPtr<IWbemServices> spServices;

	// Connect to CIM
	hr = spLoc->ConnectServer( bstrNamespace, NULL, NULL, 0, NULL, 0, 0, &spServices );
	if ( hr != WBEM_S_NO_ERROR ) {
		if(retSize < 0)	{
			return retSize=-retSize;
		} else {
			return retSize;
		}
	}

	// Switch the security level to IMPERSONATE so that provider will grant access to system-level objects.  
	hr = CoSetProxyBlanket( spServices, RPC_C_AUTHN_WINNT, RPC_C_AUTHZ_NONE, NULL, RPC_C_AUTHN_LEVEL_CALL, RPC_C_IMP_LEVEL_IMPERSONATE, NULL, EOAC_NONE );
	if ( hr != S_OK ) {
		if(retSize < 0)	{
			return retSize=-retSize;
		} else {
			return retSize;
		}
	}

	// Get the vid controller
	CComPtr<IEnumWbemClassObject> spEnumInst = NULL;
	hr = spServices->CreateInstanceEnum( CComBSTR( "Win32_VideoController" ), WBEM_FLAG_SHALLOW, NULL, &spEnumInst ); 
	if ( hr != WBEM_S_NO_ERROR || spEnumInst == NULL ) {
		if(retSize < 0)	{
			return retSize=-retSize;
		} else {
			return retSize;
		}
	}

	ULONG uNumOfInstances = 0;
	CComPtr<IWbemClassObject> spInstance = NULL;
	hr = spEnumInst->Next( 10000, 1, &spInstance, &uNumOfInstances );

	if ( hr == S_OK && spInstance ) {
		// Get properties from the object
		CComVariant varSize;
		hr = spInstance->Get( CComBSTR( _T( "AdapterRAM" ) ), 0, &varSize, 0, 0 );
		if ( hr == S_OK ) {
			retSize = abs(varSize.intVal) / ( 1024 * 1024 );
			if ( retSize == 0 ) {
				retSize = 64;
			}
		}
	}
	return abs(retSize);
#endif
}
this will return the right ammont :)
Productivity is a state of mind.
revelator
Posts: 2621
Joined: Thu Jan 24, 2008 12:04 pm
Location: inside tha debugger

Re: Doom3 shadow optimization

Post by revelator »

Huh has anyone tried out the old R200 renderer with Doom3 recently ? asking because i just did and it runs amazlingly well with my R9 270x :shock:
Productivity is a state of mind.
nbohr1more
Posts: 54
Joined: Fri Dec 09, 2011 7:04 am

Re: Doom3 shadow optimization

Post by nbohr1more »

Interesting result. Though I suspect it's simply the lack of shaders there?

You'd think the lack of r_useShadowVertexProgram in that path would make it slower.
revelator
Posts: 2621
Joined: Thu Jan 24, 2008 12:04 pm
Location: inside tha debugger

Re: Doom3 shadow optimization

Post by revelator »

One should have thought aye. AMD even supports some Nvidia specific render calls, found that one out when i noticed Barnes VBO mem code used the Nvidia api for getting videoram :lol:
Productivity is a state of mind.
revelator
Posts: 2621
Joined: Thu Jan 24, 2008 12:04 pm
Location: inside tha debugger

Re: Doom3 shadow optimization

Post by revelator »

FInal version of MH's VBO code.

This is for the GLEW version, if you still use the old qgl calls you need to put a q before the gl calls eg. glEnable should then be qglEnable.

Code: Select all

/*
===========================================================================

Doom 3 GPL Source Code
Copyright (C) 1999-2011 id Software LLC, a ZeniMax Media company.

This file is part of the Doom 3 GPL Source Code ("Doom 3 Source Code").

Doom 3 Source Code is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

Doom 3 Source Code is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with Doom 3 Source Code.  If not, see <http://www.gnu.org/licenses/>.

In addition, the Doom 3 Source Code is also subject to certain additional terms. You should have received a copy of these additional terms immediately following the terms and conditions of the GNU General Public License which accompanied the Doom 3 Source Code.  If not, please request a copy in writing from id Software at the address below.

If you have questions concerning this license or the applicable additional terms, you may contact in writing id Software LLC, c/o ZeniMax Media Inc., Suite 120, Rockville, Maryland 20850 USA.

===========================================================================
*/

#include "precompiled.h"
#include "tr_local.h"

static const int  FRAME_MEMORY_BYTES = 0x400000;
static const int  EXPAND_HEADERS = 32;

// turned r_useArbBufferRange off by default, does nasty things to AMD cards.
idCVar idVertexCache::r_showVertexCache( "r_showVertexCache", "0", CVAR_INTEGER | CVAR_RENDERER, "show vertex cache" );
idCVar idVertexCache::r_useArbBufferRange( "r_useArbBufferRange", "0", CVAR_BOOL | CVAR_RENDERER, "use ARB_map_buffer_range for optimization" );
idCVar idVertexCache::r_reuseVertexCacheSooner( "r_reuseVertexCacheSooner", "1", CVAR_BOOL | CVAR_RENDERER, "reuse vertex buffers as soon as possible after freeing" );

idVertexCache     vertexCache;

/*
==============
R_ShowVBOMem_f
==============
*/
void R_ShowVBOMem_f( const idCmdArgs &args ) {
	vertexCache.Show();
}

/*
==============
R_ListVBOMem_f
==============
*/
void R_ListVBOMem_f( const idCmdArgs &args ) {
	vertexCache.List();
}

/*
==============
idVertexCache::ActuallyFree
==============
*/
void idVertexCache::ActuallyFree( vertCache_t *block ) {
	if( !block ) {
		common->Error( "idVertexCache Free: NULL pointer" );
	}
	
	if( block->user ) {
		// let the owner know we have purged it
		*block->user = NULL;
		block->user = NULL;
	}
	
	// temp blocks are in a shared space that won't be freed
	if( block->tag != TAG_TEMP ) {
		staticAllocTotal -= block->size;
		staticCountTotal--;
		if( virtualMemory ) {
			delete [] block->virtMem;
			block->virtMem = NULL;
		}
	}
	block->tag = TAG_FREE;     // mark as free
	
	// unlink stick it back on the free list
	block->next->prev = block->prev;
	block->prev->next = block->next;
	
	if( r_reuseVertexCacheSooner.GetBool() ) {
		// stick it on the front of the free list so it will be reused immediately
		block->next = freeStaticHeaders.next;
		block->prev = &freeStaticHeaders;
	} else {
		// stick it on the back of the free list so it won't be reused soon (just for debugging)
		block->next = &freeStaticHeaders;
		block->prev = freeStaticHeaders.prev;
	}
	block->next->prev = block;
	block->prev->next = block;
}

/*
==============
idVertexCache::Position

this will be a real pointer with virtual memory,
but it will be an int offset cast to a pointer with
ARB_vertex_buffer_object

The ARB_vertex_buffer_object will be bound
==============
*/
void *idVertexCache::Position( vertCache_t *buffer ) {
	if( !buffer || buffer->tag == TAG_FREE ) {
		common->FatalError( "idVertexCache::Position: bad vertCache_t" );
	}
	
	// the ARB vertex object just uses an offset
	if( buffer->vbo ) {
		if( r_showVertexCache.GetInteger() == 2 ) {
			if( buffer->tag == TAG_TEMP ) {
				common->Printf( "GL_ARRAY_BUFFER_ARB = %i + %i (%i bytes)\n", buffer->vbo, buffer->offset, buffer->size );
			} else {
				common->Printf( "GL_ARRAY_BUFFER_ARB = %i (%i bytes)\n", buffer->vbo, buffer->size );
			}
		}
		BindIndex( ( buffer->indexBuffer ? GL_ELEMENT_ARRAY_BUFFER : GL_ARRAY_BUFFER ), buffer->vbo );
		return ( void * )buffer->offset;
	}
	
	// virtual memory is a real pointer
	return ( void * )( ( byte * )buffer->virtMem + buffer->offset );
}

//================================================================================

// dont make these static or the engine will crash.
GLuint vertexBuffer = 0;
GLuint indexBuffer = 0;

/*
===========
idVertexCache::BindIndex

Makes sure it only allocates the right buffers once.
===========
*/
void idVertexCache::BindIndex( GLenum target, GLuint vbo ) {
	switch( target ) {
	case GL_ARRAY_BUFFER:
		if( vertexBuffer != vbo ) {
			// this happens more often than you might think :(
			glBindBufferARB( target, vbo );
			vertexBuffer = vbo;
			return;
		}
		break;
		
	case GL_ELEMENT_ARRAY_BUFFER:
		if( indexBuffer != vbo ) {
			// this happens more often than you might think :(
			glBindBufferARB( target, vbo );
			indexBuffer = vbo;
			return;
		}
		break;
		
	default:
		common->FatalError( "BindIndex : unknown buffer target : %i\n", static_cast<int>( target ) );
		break;
	}
}

/*
===========
idVertexCache::UnbindIndex

Makes sure it only deallocates the right buffers once.
===========
*/
void idVertexCache::UnbindIndex( GLenum target ) {
	switch( target ) {
	case GL_ARRAY_BUFFER:
		if( vertexBuffer != 0 )	{
			// this happens more often than you might think :(
			glBindBufferARB( target, 0 );
			vertexBuffer = 0;
			return;
		}
		break;
		
	case GL_ELEMENT_ARRAY_BUFFER:
		if( indexBuffer != 0 ) {
			// this happens more often than you might think :(
			glBindBufferARB( target, 0 );
			indexBuffer = 0;
			return;
		}
		break;
		
	default:
		common->FatalError( "UnbindIndex : unknown buffer target : %i\n", static_cast<int>( target ) );
		break;
	}
}

//================================================================================

/*
===========
idVertexCache::Init
===========
*/
void idVertexCache::Init() {
	cmdSystem->AddCommand( "showVBOMem", R_ShowVBOMem_f, CMD_FL_RENDERER, "Shows Allocated Vertex Buffer Memory" );
	cmdSystem->AddCommand( "ListVBOMem", R_ListVBOMem_f, CMD_FL_RENDERER, "lists Objects Allocated in Vertex Cache" );
	
	// use ARB_vertex_buffer_object unless explicitly disabled
	if( glConfig.ARBVertexBufferObjectAvailable ) {
		virtualMemory = false;
		r_useIndexBuffers.SetBool( true );
		common->Printf( "using ARB_vertex_buffer_object memory\n" );
	} else {
		virtualMemory = true;
		r_useIndexBuffers.SetBool( false );
		common->Printf( "WARNING: vertex array range in virtual memory (SLOW)\n" );
	}
	
	// initialize the cache memory blocks
	freeStaticHeaders.next = freeStaticHeaders.prev = &freeStaticHeaders;
	staticHeaders.next = staticHeaders.prev = &staticHeaders;
	freeDynamicHeaders.next = freeDynamicHeaders.prev = &freeDynamicHeaders;
	dynamicHeaders.next = dynamicHeaders.prev = &dynamicHeaders;
	deferredFreeList.next = deferredFreeList.prev = &deferredFreeList;
	
	// set up the dynamic frame memory
	frameBytes = FRAME_MEMORY_BYTES;
	staticAllocTotal = 0;
	
	// allocate a dummy buffer
	byte *frameBuffer = new byte[frameBytes];	
	for( int i = 0 ; i < NUM_VERTEX_FRAMES ; i++ ) {
		// force the alloc to use GL_STREAM_DRAW_ARB
		allocatingTempBuffer = true;
		Alloc( frameBuffer, frameBytes, &tempBuffers[i] );
		allocatingTempBuffer = false;
		tempBuffers[i]->tag = TAG_FIXED;
		
		// unlink these from the static list, so they won't ever get purged
		tempBuffers[i]->next->prev = tempBuffers[i]->prev;
		tempBuffers[i]->prev->next = tempBuffers[i]->next;
	}
	
	// use C++ allocation
	delete [] frameBuffer;
	frameBuffer = NULL;
	
	EndFrame();
}

/*
===========
idVertexCache::PurgeAll

Used when toggling vertex programs on or off, because
the cached data isn't valid
===========
*/
void idVertexCache::PurgeAll() {
	while( staticHeaders.next != &staticHeaders ) {
		ActuallyFree( staticHeaders.next );
	}
}

/*
===========
idVertexCache::Shutdown
===========
*/
void idVertexCache::Shutdown() {
	headerAllocator.Shutdown();
}

/*
===========
idVertexCache::Alloc
===========
*/
void idVertexCache::Alloc( void *data, int size, vertCache_t **buffer, bool doIndex ) {
	vertCache_t *block = NULL;
	
	if( size <= 0 )	{
		common->Error( "idVertexCache::Alloc: size = %i\n", size );
	}
	
	// if we can't find anything, it will be NULL
	*buffer = NULL;
	
	// if we don't have any remaining unused headers, allocate some more
	if( freeStaticHeaders.next == &freeStaticHeaders ) {
		for( int i = 0; i < EXPAND_HEADERS; i++ ) {
			block = headerAllocator.Alloc();
			
			if( !virtualMemory ) {
				glGenBuffers( 1, &block->vbo );
				block->size = 0;
			}
			block->next = freeStaticHeaders.next;
			block->prev = &freeStaticHeaders;
			block->next->prev = block;
			block->prev->next = block;
		}
	}
	GLenum target = ( doIndex ? GL_ELEMENT_ARRAY_BUFFER : GL_ARRAY_BUFFER );
	GLenum usage = ( allocatingTempBuffer ? GL_STREAM_DRAW : GL_STATIC_DRAW );
	
	// try to find a matching block to replace so that we're not continually respecifying vbo data each frame
	for( vertCache_t *findblock = freeStaticHeaders.next; /**/; findblock = findblock->next ) {
		if( findblock == &freeStaticHeaders ) {
			block = freeStaticHeaders.next;
			break;
		}
		
		if( findblock->target != target ) {
			continue;
		}
		
		if( findblock->usage != usage )	{
			continue;
		}
		
		if( findblock->size != size ) {
			continue;
		}
		block = findblock;
		break;
	}
	
	// move it from the freeStaticHeaders list to the staticHeaders list
	block->target = target;
	block->usage = usage;
	
	if( block->vbo ) {
		// orphan the buffer in case it needs respecifying (it usually will)
		BindIndex( target, block->vbo );
		glBufferDataARB( target, static_cast<GLsizeiptr>( size ), NULL, usage );
		glBufferDataARB( target, static_cast<GLsizeiptr>( size ), data, usage );
	} else {
		// use C++ allocation
		block->virtMem = new byte[size];
		SIMDProcessor->Memcpy( block->virtMem, data, size );
	}
	block->next->prev = block->prev;
	block->prev->next = block->next;
	block->next = staticHeaders.next;
	block->prev = &staticHeaders;
	block->next->prev = block;
	block->prev->next = block;
	block->size = size;
	block->offset = 0;
	block->tag = TAG_USED;
	
	// save data for debugging
	staticAllocThisFrame += block->size;
	staticCountThisFrame++;
	staticCountTotal++;
	staticAllocTotal += block->size;
	
	// this will be set to zero when it is purged
	block->user = buffer;
	*buffer = block;
	
	// allocation doesn't imply used-for-drawing, because at level
	// load time lots of things may be created, but they aren't
	// referenced by the GPU yet, and can be purged if needed.
	block->frameUsed = currentFrame - NUM_VERTEX_FRAMES;
	block->indexBuffer = doIndex;
}

/*
===========
idVertexCache::Touch
===========
*/
void idVertexCache::Touch( vertCache_t *block ) {
	if( !block ) {
		common->Error( "idVertexCache Touch: NULL pointer" );
	}
	
	if( block->tag == TAG_FREE ) {
		common->FatalError( "idVertexCache Touch: freed pointer" );
	}
	
	if( block->tag == TAG_TEMP ) {
		common->FatalError( "idVertexCache Touch: temporary pointer" );
	}
	block->frameUsed = currentFrame;
	
	// move to the head of the LRU list
	block->next->prev = block->prev;
	block->prev->next = block->next;
	block->next = staticHeaders.next;
	block->prev = &staticHeaders;
	staticHeaders.next->prev = block;
	staticHeaders.next = block;
}

/*
===========
idVertexCache::Free
===========
*/
void idVertexCache::Free( vertCache_t *block ) {
	if( !block ) {
		return;
	}
	
	if( block->tag == TAG_FREE ) {
		common->FatalError( "idVertexCache Free: freed pointer" );
	}
	
	if( block->tag == TAG_TEMP ) {
		common->FatalError( "idVertexCache Free: temporary pointer" );
	}
	
	// this block still can't be purged until the frame count has expired,
	// but it won't need to clear a user pointer when it is
	block->user = NULL;
	block->next->prev = block->prev;
	block->prev->next = block->next;
	block->next = deferredFreeList.next;
	block->prev = &deferredFreeList;
	deferredFreeList.next->prev = block;
	deferredFreeList.next = block;
}

/*
===========
idVertexCache::MapBufferRange

MH's Version fast on Nvidia But fails on AMD.
===========
*/
vertCache_t *idVertexCache::MapBufferRange( vertCache_t *buffer, void *data, int size )
{
	GLbitfield	access = ( GL_MAP_WRITE_BIT | ( ( buffer->offset == 0 ) ? GL_MAP_INVALIDATE_BUFFER_BIT : GL_MAP_UNSYNCHRONIZED_BIT | GL_MAP_INVALIDATE_RANGE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT ) );
	GLvoid      *ptr = glMapBufferRange( GL_ARRAY_BUFFER, static_cast<GLintptr>( buffer->offset ), static_cast<GLsizeiptr>( size ), access );

	// AMD fix added glFlushMappedBufferRange to clear explicit bits.
	if ( ptr ) {
		SIMDProcessor->Memcpy( static_cast<byte *>( ptr ), data, size );
		glFlushMappedBufferRange( GL_ARRAY_BUFFER, static_cast<GLintptr>( buffer->offset ), static_cast<GLsizeiptr>( size ) );
		glUnmapBufferARB( GL_ARRAY_BUFFER );
		return buffer;
	} else {
		glBufferSubDataARB( GL_ARRAY_BUFFER, static_cast<GLintptrARB>( buffer->offset ), static_cast<GLsizeiptr>( size ), data );
	}
	return buffer;
}

/*
===========
idVertexCache::MapBuffer

If the above fails we still map using the old version.
===========
*/
vertCache_t *idVertexCache::MapBuffer( vertCache_t *buffer, void *data, int size )
{
	GLenum	access = ( GL_MAP_WRITE_BIT | ( ( buffer->offset == 0 ) ? GL_MAP_INVALIDATE_BUFFER_BIT : GL_MAP_UNSYNCHRONIZED_BIT | GL_MAP_INVALIDATE_RANGE_BIT ) );
	GLvoid  *ptr = glMapBufferARB( GL_ARRAY_BUFFER, access );

	if ( ptr ) {
		SIMDProcessor->Memcpy( static_cast<byte *>( ptr ), data, size );
		glUnmapBufferARB( GL_ARRAY_BUFFER );
		return buffer;
	} else {
		glBufferSubDataARB( GL_ARRAY_BUFFER, static_cast<GLintptrARB>( buffer->offset ), static_cast<GLsizeiptr>( size ), data );
	}
	return buffer;
}

/*
===========
idVertexCache::AllocFrameTemp

A frame temp allocation must never be allowed to fail due to overflow.
We can't simply sync with the GPU and overwrite what we have, because
there may still be future references to dynamically created surfaces.
===========
*/
vertCache_t *idVertexCache::AllocFrameTemp( void *data, int size ) {
	vertCache_t *block;
	
	if( size <= 0 ) {
		common->Error( "idVertexCache::AllocFrameTemp: size = %i\n", size );
	}
	
	if( dynamicAllocThisFrame + size > frameBytes )	{
		// if we don't have enough room in the temp block, allocate a static block,
		// but immediately free it so it will get freed at the next frame
		tempOverflow = true;
		Alloc( data, size, &block );
		Free( block );
		return block;
	}
	
	// this data is just going on the shared dynamic list
	// if we don't have any remaining unused headers, allocate some more
	if( freeDynamicHeaders.next == &freeDynamicHeaders ) {
		for( int i = 0; i < EXPAND_HEADERS; i++ ) {
			block = headerAllocator.Alloc();
			block->next = freeDynamicHeaders.next;
			block->prev = &freeDynamicHeaders;
			block->next->prev = block;
			block->prev->next = block;
		}
	}
	
	// move it from the freeDynamicHeaders list to the dynamicHeaders list
	block = freeDynamicHeaders.next;
	block->next->prev = block->prev;
	block->prev->next = block->next;
	block->next = dynamicHeaders.next;
	block->prev = &dynamicHeaders;
	block->next->prev = block;
	block->prev->next = block;
	block->size = size;
	block->tag = TAG_TEMP;
	block->indexBuffer = false;
	block->offset = dynamicAllocThisFrame;
	dynamicAllocThisFrame += block->size;
	dynamicCountThisFrame++;
	block->user = NULL;
	block->frameUsed = 0;
	
	// copy the data
	block->virtMem = tempBuffers[listNum]->virtMem;
	block->vbo = tempBuffers[listNum]->vbo;
	
	// mh code start
	if( block->vbo ) {
		BindIndex( GL_ARRAY_BUFFER, block->vbo );		
		// try to get an unsynchronized map if at all possible
		if( glConfig.ARBMapBufferRangeAvailable && r_useArbBufferRange.GetBool() ) {
			// if the buffer has wrapped then we orphan it
			return MapBufferRange( block, data, size );
		} else {
			// if the buffer has wrapped then we orphan it
			return MapBuffer( block, data, size );
		}
	} else if( block->virtMem ) {
		SIMDProcessor->Memcpy( static_cast<byte *>( block->virtMem ) + block->offset, data, size );
	}
	return block;
}

/*
===========
idVertexCache::EndFrame
===========
*/
void idVertexCache::EndFrame() {
	// display debug information
	if( r_showVertexCache.GetBool() ) {
		int staticUseCount = 0;
		int staticUseSize = 0;
		
		for( vertCache_t *block = staticHeaders.next ; block != &staticHeaders ; block = block->next ) {
			if( block->frameUsed == currentFrame ) {
				staticUseCount++;
				staticUseSize += block->size;
			}
		}
		const char *frameOverflow = tempOverflow ? "(OVERFLOW)" : "";
		common->Printf( "vertex dynamic:%i=%ik%s, static alloc:%i=%ik used:%i=%ik total:%i=%ik\n",
						dynamicCountThisFrame, dynamicAllocThisFrame / 1024, frameOverflow,
						staticCountThisFrame, staticAllocThisFrame / 1024,
						staticUseCount, staticUseSize / 1024,
						staticCountTotal, staticAllocTotal / 1024 );
	}
	
	// unbind vertex buffers so normal virtual memory will be used
	if( !virtualMemory ) {
		UnbindIndex( GL_ARRAY_BUFFER_ARB );
		UnbindIndex( GL_ELEMENT_ARRAY_BUFFER_ARB );
	}
	currentFrame = tr.frameCount;
	listNum = currentFrame % NUM_VERTEX_FRAMES;
	staticAllocThisFrame = 0;
	staticCountThisFrame = 0;
	dynamicAllocThisFrame = 0;
	dynamicCountThisFrame = 0;
	tempOverflow = false;
	
	// free all the deferred free headers
	while( deferredFreeList.next != &deferredFreeList ) {
		ActuallyFree( deferredFreeList.next );
	}
	
	// free all the frame temp headers
	vertCache_t *block = dynamicHeaders.next;
	
	if( block != &dynamicHeaders ) {
		block->prev = &freeDynamicHeaders;
		dynamicHeaders.prev->next = freeDynamicHeaders.next;
		freeDynamicHeaders.next->prev = dynamicHeaders.prev;
		freeDynamicHeaders.next = block;
		dynamicHeaders.next = dynamicHeaders.prev = &dynamicHeaders;
	}
}

/*
=============
idVertexCache::List
=============
*/
void idVertexCache::List( void ) {
	int			numActive = 0;
	int			frameStatic = 0;
	int			totalStatic = 0;
	vertCache_t *block;
	
	for( block = staticHeaders.next; block != &staticHeaders; block = block->next )	{
		numActive++;
		totalStatic += block->size;
		
		if( block->frameUsed == currentFrame ) {
			frameStatic += block->size;
		}
	}
	int   numFreeStaticHeaders = 0;
	
	for( block = freeStaticHeaders.next; block != &freeStaticHeaders; block = block->next )	{
		numFreeStaticHeaders++;
	}
	int   numFreeDynamicHeaders = 0;
	
	for( block = freeDynamicHeaders.next; block != &freeDynamicHeaders; block = block->next ) {
		numFreeDynamicHeaders++;
	}
	common->Printf( "%i dynamic temp buffers of %ik\n", NUM_VERTEX_FRAMES, frameBytes / 1024 );
	common->Printf( "%5i active static headers\n", numActive );
	common->Printf( "%5i free static headers\n", numFreeStaticHeaders );
	common->Printf( "%5i free dynamic headers\n", numFreeDynamicHeaders );
	
	if( !virtualMemory ) {
		common->Printf( "Vertex cache is in ARB_vertex_buffer_object memory (FAST).\n" );
	} else {
		common->Printf( "Vertex cache is in virtual memory (SLOW)\n" );
	}
	common->Printf( "Index buffers are accelerated.\n" );
}

/*
=============
idVertexCache::Show

Barnes,
replaces the broken glconfig string version.
Revelator cannot use glew's function pointers.
=============
*/
void idVertexCache::Show( void ) {
	GLint  mem[4];
	
	if( glewIsSupported( "GL_NVX_gpu_memory_info" ) ) {
		common->Printf( "\nNvidia specific memory info:\n" );
		common->Printf( "\n" );
		glGetIntegerv( GL_GPU_MEMORY_INFO_DEDICATED_VIDMEM_NVX , mem );
		common->Printf( "dedicated video memory %i MB\n", mem[0] >> 10 );
		glGetIntegerv( GL_GPU_MEMORY_INFO_TOTAL_AVAILABLE_MEMORY_NVX , mem );
		common->Printf( "total available memory %i MB\n", mem[0] >> 10 );
		glGetIntegerv( GL_GPU_MEMORY_INFO_CURRENT_AVAILABLE_VIDMEM_NVX , mem );
		common->Printf( "currently unused GPU memory %i MB\n", mem[0] >> 10 );
		glGetIntegerv( GL_GPU_MEMORY_INFO_EVICTION_COUNT_NVX , mem );
		common->Printf( "count of total evictions seen by system %i MB\n", mem[0] >> 10 );
		glGetIntegerv( GL_GPU_MEMORY_INFO_EVICTED_MEMORY_NVX , mem );
		common->Printf( "total video memory evicted %i MB\n", mem[0] >> 10 );
	} else if( glewIsSupported( "GL_ATI_meminfo" ) ) {
		common->Printf( "\nATI/AMD specific memory info:\n" );
		common->Printf( "\n" );
		glGetIntegerv( GL_VBO_FREE_MEMORY_ATI, mem );
		common->Printf( "VBO: total memory free in the pool %i MB\n", mem[0] >> 10 );
		common->Printf( "VBO: largest available free block in the pool %i MB\n", mem[1] >> 10 );
		common->Printf( "VBO: total auxiliary memory free %i MB\n", mem[2] >> 10 );
		common->Printf( "VBO: largest auxiliary free block %i MB\n", mem[3] >> 10 );
		glGetIntegerv( GL_TEXTURE_FREE_MEMORY_ATI, mem );
		common->Printf( "Texture: total memory free in the pool %i MB\n", mem[0] >> 10 );
		common->Printf( "Texture: largest available free block in the pool %i MB\n", mem[1] >> 10 );
		common->Printf( "Texture: total auxiliary memory free %i MB\n", mem[2] >> 10 );
		common->Printf( "Texture: largest auxiliary free block %i MB\n", mem[3] >> 10 );
		glGetIntegerv( GL_RENDERBUFFER_FREE_MEMORY_ATI, mem );
		common->Printf( "RenderBuffer: total memory free in the pool %i MB\n", mem[0] >> 10 );
		common->Printf( "RenderBuffer: largest available free block in the pool %i MB\n", mem[1] >> 10 );
		common->Printf( "RenderBuffer: total auxiliary memory free %i MB\n", mem[2] >> 10 );
		common->Printf( "RenderBuffer: largest auxiliary free block %i MB\n", mem[3] >> 10 );
	} else {
		common->Printf( "MemInfo not availabled for your video card or driver!\n" );
	}
}

/*
=============
idVertexCache::IsFast

just for gfxinfo printing
=============
*/
bool idVertexCache::IsFast() {
	if( virtualMemory )	{
		return false;
	}
	return true;
}

Code: Select all

/*
===========================================================================

Doom 3 GPL Source Code
Copyright (C) 1999-2011 id Software LLC, a ZeniMax Media company.

This file is part of the Doom 3 GPL Source Code ("Doom 3 Source Code").

Doom 3 Source Code is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

Doom 3 Source Code is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with Doom 3 Source Code.  If not, see <http://www.gnu.org/licenses/>.

In addition, the Doom 3 Source Code is also subject to certain additional terms. You should have received a copy of these additional terms immediately following the terms and conditions of the GNU General Public License which accompanied the Doom 3 Source Code.  If not, please request a copy in writing from id Software at the address below.

If you have questions concerning this license or the applicable additional terms, you may contact in writing id Software LLC, c/o ZeniMax Media Inc., Suite 120, Rockville, Maryland 20850 USA.

===========================================================================
*/

// vertex cache calls should only be made by the front end
const int NUM_VERTEX_FRAMES = 2;

typedef enum
{
	TAG_FREE,
	TAG_USED,
	TAG_FIXED,    // for the temp buffers
	TAG_TEMP      // in frame temp area, not static area
} vertBlockTag_t;

typedef struct vertCache_s
{
	GLuint				vbo;
	GLenum				target;
	GLenum				usage;
	void				*virtMem;         // only one of vbo / virtMem will be set
	bool				indexBuffer;      // holds indexes instead of vertexes
	
	int					offset;
	int					size;          // may be larger than the amount asked for, due
	// to round up and minimum fragment sizes
	int					tag;           // a tag of 0 is a free block
	struct vertCache_s  **user;           // will be set to zero when purged
	struct vertCache_s  *next, *prev;     // may be on the static list or one of the frame lists
	int					frameUsed;        // it can't be purged if near the current frame
} vertCache_t;

class idVertexCache
{
public:
	void			Init();
	void			Shutdown();
	
	// just for gfxinfo printing
	bool			IsFast();
	
	// called when vertex programs are enabled or disabled, because
	// the cached data is no longer valid
	void			PurgeAll();
	
	// Tries to allocate space for the given data in fast vertex
	// memory, and copies it over.
	// Alloc does NOT do a touch, which allows purging of things
	// created at level load time even if a frame hasn't passed yet.
	// These allocations can be purged, which will zero the pointer.
	void			Alloc( void *data, int bytes, vertCache_t **buffer, bool indexBuffer = false );
	
	// This will be a real pointer with virtual memory,
	// but it will be an int offset cast to a pointer of ARB_vertex_buffer_object
	void			*Position( vertCache_t *buffer );
	
	// initialize the element array buffers
	void			BindIndex( GLenum target, GLuint vbo );
	
	// if you need to draw something without an indexCache,
	// this must be called to reset GL_ELEMENT_ARRAY_BUFFER_ARB
	void			UnbindIndex( GLenum target );
	
	// MH's MapBufferRange.
	vertCache_t *MapBufferRange( vertCache_t *buffer, void *data, int size );

	// Revelator's MapBuffer for cards that dont cope to well with the above.
	vertCache_t *MapBuffer( vertCache_t *buffer, void *data, int size );

	// automatically freed at the end of the next frame
	// used for specular texture coordinates and gui drawing, which
	// will change every frame.
	// will return NULL if the vertex cache is completely full
	// As with Position(), this may not actually be a pointer you can access.
	vertCache_t    *AllocFrameTemp( void *data, int bytes );
	
	// notes that a buffer is used this frame, so it can't be purged
	// out from under the GPU
	void			Touch( vertCache_t *buffer );
	
	// this block won't have to zero a buffer pointer when it is purged,
	// but it must still wait for the frames to pass, in case the GPU
	// is still referencing it
	void			Free( vertCache_t *buffer );
	
	// updates the counter for determining which temp space to use
	// and which blocks can be purged
	// Also prints debugging info when enabled
	void			EndFrame();
	
	// listVBOMem calls this
	void			List();
	
	// showVBOMem calls this
	void			Show();
	
private:
	void			InitMemoryBlocks( int size );
	void			ActuallyFree( vertCache_t *block );
	
	static idCVar   r_showVertexCache;
	static idCVar   r_useArbBufferRange;
	static idCVar   r_reuseVertexCacheSooner;
	
	int				staticCountTotal;
	int				staticAllocTotal;		// for end of frame purging
	
	int				staticAllocThisFrame;   // debug counter
	int				staticCountThisFrame;
	int				dynamicAllocThisFrame;
	int				dynamicCountThisFrame;
	
	int				currentFrame;			// for purgable block tracking
	int				listNum;				// currentFrame % NUM_VERTEX_FRAMES, determines which tempBuffers to use
	
	bool			virtualMemory;			// not fast stuff
	
	bool			allocatingTempBuffer;   // force GL_STREAM_DRAW_ARB
	
	vertCache_t     *tempBuffers[NUM_VERTEX_FRAMES];    // allocated at startup
	bool			tempOverflow;						// had to alloc a temp in static memory
	
	idBlockAlloc<vertCache_t, 1024> headerAllocator;
	
	vertCache_t     freeStaticHeaders;         // head of doubly linked list
	vertCache_t     freeDynamicHeaders;     // head of doubly linked list
	vertCache_t     dynamicHeaders;           // head of doubly linked list
	vertCache_t	deferredFreeList;	   // head of doubly linked list
	vertCache_t     staticHeaders;		   // head of doubly linked list in MRU order, staticHeaders.next is most recently used
	int			frameBytes;		  // for each of NUM_VERTEX_FRAMES frames
};

extern   idVertexCache  vertexCache;
As you will probably notice we allways Map the VBO memory now even if MapBufferRange is turned off (just using the old version of it).
Productivity is a state of mind.
revelator
Posts: 2621
Joined: Thu Jan 24, 2008 12:04 pm
Location: inside tha debugger

Re: Doom3 shadow optimization

Post by revelator »

Image

glsl renderer, looks mighty purty even though its only used for interactions / shadows :)
Productivity is a state of mind.
motorsep
Posts: 231
Joined: Wed Aug 02, 2006 11:46 pm
Location: Texas, USA
Contact:

Re: Doom3 shadow optimization

Post by motorsep »

Might as well post a comparison screenshot from Doom 3 - I don't really see much difference from the top of my head :/
revelator
Posts: 2621
Joined: Thu Jan 24, 2008 12:04 pm
Location: inside tha debugger

Re: Doom3 shadow optimization

Post by revelator »

Ofc it does not look different :) in fact i gone to great lenghts to make it look like the ARB2 version.
Original version was even darker than vanilla and had some weird shading in places.

Keeping it off for now though as it does nasty things to certain gfx mods, atm it only works on unmodified Doom3.
Productivity is a state of mind.
motorsep
Posts: 231
Joined: Wed Aug 02, 2006 11:46 pm
Location: Texas, USA
Contact:

Re: Doom3 shadow optimization

Post by motorsep »

Lol, why go through all that trouble when it doesn't even work as expected to begin with and ARB2 backend looks better? :)
Spike
Posts: 2914
Joined: Fri Nov 05, 2004 3:12 am
Location: UK
Contact:

Re: Doom3 shadow optimization

Post by Spike »

probably because there's only one graphics company that supports any asm extensions. maybe he wants to make a gles2 port?
motorsep
Posts: 231
Joined: Wed Aug 02, 2006 11:46 pm
Location: Texas, USA
Contact:

Re: Doom3 shadow optimization

Post by motorsep »

AMD dropped it?
Spike
Posts: 2914
Joined: Fri Nov 05, 2004 3:12 am
Location: UK
Contact:

Re: Doom3 shadow optimization

Post by Spike »

by asm extensions, I mean extensions to the asm stuff, rather than the asm itself. point is that while asm works, you can't use any of the extra stuff that has since been added to glsl (like geometry shaders etc) as asm on either amd or intel gpus (afaik, I don't have either).
motorsep
Posts: 231
Joined: Wed Aug 02, 2006 11:46 pm
Location: Texas, USA
Contact:

Re: Doom3 shadow optimization

Post by motorsep »

Doom 3 BFG already does everything and more than what people have tried doing with old Doom 3. Seems like counter productive, unless it's for personal learning experience.
revelator
Posts: 2621
Joined: Thu Jan 24, 2008 12:04 pm
Location: inside tha debugger

Re: Doom3 shadow optimization

Post by revelator »

Bingo :idea:
Productivity is a state of mind.
Post Reply