Forum

Doom3 shadow optimization

Discuss programming topics for any language, any source base. If it is programming related but doesn't fit in one of the below categories, it goes here.

Moderator: InsideQC Admins

Re: Doom3 shadow optimization

Postby revelator » Tue Jun 17, 2014 6:39 pm

Just for fun i tried an old build of raynors engine with glsl interactions and the bug is gone when in glsl mode so its definatly some fuckup with ARB2 shaders on AMD.
Productivity is a state of mind.
User avatar
revelator
 
Posts: 2496
Joined: Thu Jan 24, 2008 12:04 pm
Location: inside tha debugger

Re: Doom3 shadow optimization

Postby revelator » Wed Jun 18, 2014 8:19 am

Nailed it.

Its a mix of arb mapbufferrange acting up with my driver and me trying to fix shadows instead urgh.

but heres a lil goodie, i fixed the bad function for getting videoram from Doom3.

Turned out it works just fine besides returning negative values :lol:

Code: Select all
/*
================
Sys_GetVideoRam
returns in megabytes

This function works but returned negative sizes.
Fixed now.
================
*/
int Sys_GetVideoRam( void ) {
#ifdef   ID_DEDICATED
   return 0;
#else
   int retSize = 64;

   CComPtr<IWbemLocator> spLoc = NULL;
   HRESULT hr = CoCreateInstance( CLSID_WbemLocator, 0, CLSCTX_SERVER, IID_IWbemLocator, ( LPVOID * ) &spLoc );
   if ( hr != S_OK || spLoc == NULL ) {
      return retSize;
   }

   CComBSTR bstrNamespace( _T( "\\\\.\\root\\CIMV2" ) );
   CComPtr<IWbemServices> spServices;

   // Connect to CIM
   hr = spLoc->ConnectServer( bstrNamespace, NULL, NULL, 0, NULL, 0, 0, &spServices );
   if ( hr != WBEM_S_NO_ERROR ) {
      if(retSize < 0)   {
         return retSize=-retSize;
      } else {
         return retSize;
      }
   }

   // Switch the security level to IMPERSONATE so that provider will grant access to system-level objects. 
   hr = CoSetProxyBlanket( spServices, RPC_C_AUTHN_WINNT, RPC_C_AUTHZ_NONE, NULL, RPC_C_AUTHN_LEVEL_CALL, RPC_C_IMP_LEVEL_IMPERSONATE, NULL, EOAC_NONE );
   if ( hr != S_OK ) {
      if(retSize < 0)   {
         return retSize=-retSize;
      } else {
         return retSize;
      }
   }

   // Get the vid controller
   CComPtr<IEnumWbemClassObject> spEnumInst = NULL;
   hr = spServices->CreateInstanceEnum( CComBSTR( "Win32_VideoController" ), WBEM_FLAG_SHALLOW, NULL, &spEnumInst );
   if ( hr != WBEM_S_NO_ERROR || spEnumInst == NULL ) {
      if(retSize < 0)   {
         return retSize=-retSize;
      } else {
         return retSize;
      }
   }

   ULONG uNumOfInstances = 0;
   CComPtr<IWbemClassObject> spInstance = NULL;
   hr = spEnumInst->Next( 10000, 1, &spInstance, &uNumOfInstances );

   if ( hr == S_OK && spInstance ) {
      // Get properties from the object
      CComVariant varSize;
      hr = spInstance->Get( CComBSTR( _T( "AdapterRAM" ) ), 0, &varSize, 0, 0 );
      if ( hr == S_OK ) {
         retSize = abs(varSize.intVal) / ( 1024 * 1024 );
         if ( retSize == 0 ) {
            retSize = 64;
         }
      }
   }
   return abs(retSize);
#endif
}


this will return the right ammont :)
Productivity is a state of mind.
User avatar
revelator
 
Posts: 2496
Joined: Thu Jan 24, 2008 12:04 pm
Location: inside tha debugger

Re: Doom3 shadow optimization

Postby revelator » Thu Jun 19, 2014 7:44 am

Huh has anyone tried out the old R200 renderer with Doom3 recently ? asking because i just did and it runs amazlingly well with my R9 270x :shock:
Productivity is a state of mind.
User avatar
revelator
 
Posts: 2496
Joined: Thu Jan 24, 2008 12:04 pm
Location: inside tha debugger

Re: Doom3 shadow optimization

Postby nbohr1more » Fri Jun 20, 2014 1:13 am

Interesting result. Though I suspect it's simply the lack of shaders there?

You'd think the lack of r_useShadowVertexProgram in that path would make it slower.
nbohr1more
 
Posts: 54
Joined: Fri Dec 09, 2011 7:04 am

Re: Doom3 shadow optimization

Postby revelator » Fri Jun 20, 2014 7:03 am

One should have thought aye. AMD even supports some Nvidia specific render calls, found that one out when i noticed Barnes VBO mem code used the Nvidia api for getting videoram :lol:
Productivity is a state of mind.
User avatar
revelator
 
Posts: 2496
Joined: Thu Jan 24, 2008 12:04 pm
Location: inside tha debugger

Re: Doom3 shadow optimization

Postby revelator » Fri Jun 20, 2014 8:13 am

FInal version of MH's VBO code.

This is for the GLEW version, if you still use the old qgl calls you need to put a q before the gl calls eg. glEnable should then be qglEnable.

Code: Select all
/*
===========================================================================

Doom 3 GPL Source Code
Copyright (C) 1999-2011 id Software LLC, a ZeniMax Media company.

This file is part of the Doom 3 GPL Source Code ("Doom 3 Source Code").

Doom 3 Source Code is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

Doom 3 Source Code is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with Doom 3 Source Code.  If not, see <http://www.gnu.org/licenses/>.

In addition, the Doom 3 Source Code is also subject to certain additional terms. You should have received a copy of these additional terms immediately following the terms and conditions of the GNU General Public License which accompanied the Doom 3 Source Code.  If not, please request a copy in writing from id Software at the address below.

If you have questions concerning this license or the applicable additional terms, you may contact in writing id Software LLC, c/o ZeniMax Media Inc., Suite 120, Rockville, Maryland 20850 USA.

===========================================================================
*/

#include "precompiled.h"
#include "tr_local.h"

static const int  FRAME_MEMORY_BYTES = 0x400000;
static const int  EXPAND_HEADERS = 32;

// turned r_useArbBufferRange off by default, does nasty things to AMD cards.
idCVar idVertexCache::r_showVertexCache( "r_showVertexCache", "0", CVAR_INTEGER | CVAR_RENDERER, "show vertex cache" );
idCVar idVertexCache::r_useArbBufferRange( "r_useArbBufferRange", "0", CVAR_BOOL | CVAR_RENDERER, "use ARB_map_buffer_range for optimization" );
idCVar idVertexCache::r_reuseVertexCacheSooner( "r_reuseVertexCacheSooner", "1", CVAR_BOOL | CVAR_RENDERER, "reuse vertex buffers as soon as possible after freeing" );

idVertexCache     vertexCache;

/*
==============
R_ShowVBOMem_f
==============
*/
void R_ShowVBOMem_f( const idCmdArgs &args ) {
   vertexCache.Show();
}

/*
==============
R_ListVBOMem_f
==============
*/
void R_ListVBOMem_f( const idCmdArgs &args ) {
   vertexCache.List();
}

/*
==============
idVertexCache::ActuallyFree
==============
*/
void idVertexCache::ActuallyFree( vertCache_t *block ) {
   if( !block ) {
      common->Error( "idVertexCache Free: NULL pointer" );
   }
   
   if( block->user ) {
      // let the owner know we have purged it
      *block->user = NULL;
      block->user = NULL;
   }
   
   // temp blocks are in a shared space that won't be freed
   if( block->tag != TAG_TEMP ) {
      staticAllocTotal -= block->size;
      staticCountTotal--;
      if( virtualMemory ) {
         delete [] block->virtMem;
         block->virtMem = NULL;
      }
   }
   block->tag = TAG_FREE;     // mark as free
   
   // unlink stick it back on the free list
   block->next->prev = block->prev;
   block->prev->next = block->next;
   
   if( r_reuseVertexCacheSooner.GetBool() ) {
      // stick it on the front of the free list so it will be reused immediately
      block->next = freeStaticHeaders.next;
      block->prev = &freeStaticHeaders;
   } else {
      // stick it on the back of the free list so it won't be reused soon (just for debugging)
      block->next = &freeStaticHeaders;
      block->prev = freeStaticHeaders.prev;
   }
   block->next->prev = block;
   block->prev->next = block;
}

/*
==============
idVertexCache::Position

this will be a real pointer with virtual memory,
but it will be an int offset cast to a pointer with
ARB_vertex_buffer_object

The ARB_vertex_buffer_object will be bound
==============
*/
void *idVertexCache::Position( vertCache_t *buffer ) {
   if( !buffer || buffer->tag == TAG_FREE ) {
      common->FatalError( "idVertexCache::Position: bad vertCache_t" );
   }
   
   // the ARB vertex object just uses an offset
   if( buffer->vbo ) {
      if( r_showVertexCache.GetInteger() == 2 ) {
         if( buffer->tag == TAG_TEMP ) {
            common->Printf( "GL_ARRAY_BUFFER_ARB = %i + %i (%i bytes)\n", buffer->vbo, buffer->offset, buffer->size );
         } else {
            common->Printf( "GL_ARRAY_BUFFER_ARB = %i (%i bytes)\n", buffer->vbo, buffer->size );
         }
      }
      BindIndex( ( buffer->indexBuffer ? GL_ELEMENT_ARRAY_BUFFER : GL_ARRAY_BUFFER ), buffer->vbo );
      return ( void * )buffer->offset;
   }
   
   // virtual memory is a real pointer
   return ( void * )( ( byte * )buffer->virtMem + buffer->offset );
}

//================================================================================

// dont make these static or the engine will crash.
GLuint vertexBuffer = 0;
GLuint indexBuffer = 0;

/*
===========
idVertexCache::BindIndex

Makes sure it only allocates the right buffers once.
===========
*/
void idVertexCache::BindIndex( GLenum target, GLuint vbo ) {
   switch( target ) {
   case GL_ARRAY_BUFFER:
      if( vertexBuffer != vbo ) {
         // this happens more often than you might think :(
         glBindBufferARB( target, vbo );
         vertexBuffer = vbo;
         return;
      }
      break;
      
   case GL_ELEMENT_ARRAY_BUFFER:
      if( indexBuffer != vbo ) {
         // this happens more often than you might think :(
         glBindBufferARB( target, vbo );
         indexBuffer = vbo;
         return;
      }
      break;
      
   default:
      common->FatalError( "BindIndex : unknown buffer target : %i\n", static_cast<int>( target ) );
      break;
   }
}

/*
===========
idVertexCache::UnbindIndex

Makes sure it only deallocates the right buffers once.
===========
*/
void idVertexCache::UnbindIndex( GLenum target ) {
   switch( target ) {
   case GL_ARRAY_BUFFER:
      if( vertexBuffer != 0 )   {
         // this happens more often than you might think :(
         glBindBufferARB( target, 0 );
         vertexBuffer = 0;
         return;
      }
      break;
      
   case GL_ELEMENT_ARRAY_BUFFER:
      if( indexBuffer != 0 ) {
         // this happens more often than you might think :(
         glBindBufferARB( target, 0 );
         indexBuffer = 0;
         return;
      }
      break;
      
   default:
      common->FatalError( "UnbindIndex : unknown buffer target : %i\n", static_cast<int>( target ) );
      break;
   }
}

//================================================================================

/*
===========
idVertexCache::Init
===========
*/
void idVertexCache::Init() {
   cmdSystem->AddCommand( "showVBOMem", R_ShowVBOMem_f, CMD_FL_RENDERER, "Shows Allocated Vertex Buffer Memory" );
   cmdSystem->AddCommand( "ListVBOMem", R_ListVBOMem_f, CMD_FL_RENDERER, "lists Objects Allocated in Vertex Cache" );
   
   // use ARB_vertex_buffer_object unless explicitly disabled
   if( glConfig.ARBVertexBufferObjectAvailable ) {
      virtualMemory = false;
      r_useIndexBuffers.SetBool( true );
      common->Printf( "using ARB_vertex_buffer_object memory\n" );
   } else {
      virtualMemory = true;
      r_useIndexBuffers.SetBool( false );
      common->Printf( "WARNING: vertex array range in virtual memory (SLOW)\n" );
   }
   
   // initialize the cache memory blocks
   freeStaticHeaders.next = freeStaticHeaders.prev = &freeStaticHeaders;
   staticHeaders.next = staticHeaders.prev = &staticHeaders;
   freeDynamicHeaders.next = freeDynamicHeaders.prev = &freeDynamicHeaders;
   dynamicHeaders.next = dynamicHeaders.prev = &dynamicHeaders;
   deferredFreeList.next = deferredFreeList.prev = &deferredFreeList;
   
   // set up the dynamic frame memory
   frameBytes = FRAME_MEMORY_BYTES;
   staticAllocTotal = 0;
   
   // allocate a dummy buffer
   byte *frameBuffer = new byte[frameBytes];   
   for( int i = 0 ; i < NUM_VERTEX_FRAMES ; i++ ) {
      // force the alloc to use GL_STREAM_DRAW_ARB
      allocatingTempBuffer = true;
      Alloc( frameBuffer, frameBytes, &tempBuffers[i] );
      allocatingTempBuffer = false;
      tempBuffers[i]->tag = TAG_FIXED;
      
      // unlink these from the static list, so they won't ever get purged
      tempBuffers[i]->next->prev = tempBuffers[i]->prev;
      tempBuffers[i]->prev->next = tempBuffers[i]->next;
   }
   
   // use C++ allocation
   delete [] frameBuffer;
   frameBuffer = NULL;
   
   EndFrame();
}

/*
===========
idVertexCache::PurgeAll

Used when toggling vertex programs on or off, because
the cached data isn't valid
===========
*/
void idVertexCache::PurgeAll() {
   while( staticHeaders.next != &staticHeaders ) {
      ActuallyFree( staticHeaders.next );
   }
}

/*
===========
idVertexCache::Shutdown
===========
*/
void idVertexCache::Shutdown() {
   headerAllocator.Shutdown();
}

/*
===========
idVertexCache::Alloc
===========
*/
void idVertexCache::Alloc( void *data, int size, vertCache_t **buffer, bool doIndex ) {
   vertCache_t *block = NULL;
   
   if( size <= 0 )   {
      common->Error( "idVertexCache::Alloc: size = %i\n", size );
   }
   
   // if we can't find anything, it will be NULL
   *buffer = NULL;
   
   // if we don't have any remaining unused headers, allocate some more
   if( freeStaticHeaders.next == &freeStaticHeaders ) {
      for( int i = 0; i < EXPAND_HEADERS; i++ ) {
         block = headerAllocator.Alloc();
         
         if( !virtualMemory ) {
            glGenBuffers( 1, &block->vbo );
            block->size = 0;
         }
         block->next = freeStaticHeaders.next;
         block->prev = &freeStaticHeaders;
         block->next->prev = block;
         block->prev->next = block;
      }
   }
   GLenum target = ( doIndex ? GL_ELEMENT_ARRAY_BUFFER : GL_ARRAY_BUFFER );
   GLenum usage = ( allocatingTempBuffer ? GL_STREAM_DRAW : GL_STATIC_DRAW );
   
   // try to find a matching block to replace so that we're not continually respecifying vbo data each frame
   for( vertCache_t *findblock = freeStaticHeaders.next; /**/; findblock = findblock->next ) {
      if( findblock == &freeStaticHeaders ) {
         block = freeStaticHeaders.next;
         break;
      }
      
      if( findblock->target != target ) {
         continue;
      }
      
      if( findblock->usage != usage )   {
         continue;
      }
      
      if( findblock->size != size ) {
         continue;
      }
      block = findblock;
      break;
   }
   
   // move it from the freeStaticHeaders list to the staticHeaders list
   block->target = target;
   block->usage = usage;
   
   if( block->vbo ) {
      // orphan the buffer in case it needs respecifying (it usually will)
      BindIndex( target, block->vbo );
      glBufferDataARB( target, static_cast<GLsizeiptr>( size ), NULL, usage );
      glBufferDataARB( target, static_cast<GLsizeiptr>( size ), data, usage );
   } else {
      // use C++ allocation
      block->virtMem = new byte[size];
      SIMDProcessor->Memcpy( block->virtMem, data, size );
   }
   block->next->prev = block->prev;
   block->prev->next = block->next;
   block->next = staticHeaders.next;
   block->prev = &staticHeaders;
   block->next->prev = block;
   block->prev->next = block;
   block->size = size;
   block->offset = 0;
   block->tag = TAG_USED;
   
   // save data for debugging
   staticAllocThisFrame += block->size;
   staticCountThisFrame++;
   staticCountTotal++;
   staticAllocTotal += block->size;
   
   // this will be set to zero when it is purged
   block->user = buffer;
   *buffer = block;
   
   // allocation doesn't imply used-for-drawing, because at level
   // load time lots of things may be created, but they aren't
   // referenced by the GPU yet, and can be purged if needed.
   block->frameUsed = currentFrame - NUM_VERTEX_FRAMES;
   block->indexBuffer = doIndex;
}

/*
===========
idVertexCache::Touch
===========
*/
void idVertexCache::Touch( vertCache_t *block ) {
   if( !block ) {
      common->Error( "idVertexCache Touch: NULL pointer" );
   }
   
   if( block->tag == TAG_FREE ) {
      common->FatalError( "idVertexCache Touch: freed pointer" );
   }
   
   if( block->tag == TAG_TEMP ) {
      common->FatalError( "idVertexCache Touch: temporary pointer" );
   }
   block->frameUsed = currentFrame;
   
   // move to the head of the LRU list
   block->next->prev = block->prev;
   block->prev->next = block->next;
   block->next = staticHeaders.next;
   block->prev = &staticHeaders;
   staticHeaders.next->prev = block;
   staticHeaders.next = block;
}

/*
===========
idVertexCache::Free
===========
*/
void idVertexCache::Free( vertCache_t *block ) {
   if( !block ) {
      return;
   }
   
   if( block->tag == TAG_FREE ) {
      common->FatalError( "idVertexCache Free: freed pointer" );
   }
   
   if( block->tag == TAG_TEMP ) {
      common->FatalError( "idVertexCache Free: temporary pointer" );
   }
   
   // this block still can't be purged until the frame count has expired,
   // but it won't need to clear a user pointer when it is
   block->user = NULL;
   block->next->prev = block->prev;
   block->prev->next = block->next;
   block->next = deferredFreeList.next;
   block->prev = &deferredFreeList;
   deferredFreeList.next->prev = block;
   deferredFreeList.next = block;
}

/*
===========
idVertexCache::MapBufferRange

MH's Version fast on Nvidia But fails on AMD.
===========
*/
vertCache_t *idVertexCache::MapBufferRange( vertCache_t *buffer, void *data, int size )
{
   GLbitfield   access = ( GL_MAP_WRITE_BIT | ( ( buffer->offset == 0 ) ? GL_MAP_INVALIDATE_BUFFER_BIT : GL_MAP_UNSYNCHRONIZED_BIT | GL_MAP_INVALIDATE_RANGE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT ) );
   GLvoid      *ptr = glMapBufferRange( GL_ARRAY_BUFFER, static_cast<GLintptr>( buffer->offset ), static_cast<GLsizeiptr>( size ), access );

   // AMD fix added glFlushMappedBufferRange to clear explicit bits.
   if ( ptr ) {
      SIMDProcessor->Memcpy( static_cast<byte *>( ptr ), data, size );
      glFlushMappedBufferRange( GL_ARRAY_BUFFER, static_cast<GLintptr>( buffer->offset ), static_cast<GLsizeiptr>( size ) );
      glUnmapBufferARB( GL_ARRAY_BUFFER );
      return buffer;
   } else {
      glBufferSubDataARB( GL_ARRAY_BUFFER, static_cast<GLintptrARB>( buffer->offset ), static_cast<GLsizeiptr>( size ), data );
   }
   return buffer;
}

/*
===========
idVertexCache::MapBuffer

If the above fails we still map using the old version.
===========
*/
vertCache_t *idVertexCache::MapBuffer( vertCache_t *buffer, void *data, int size )
{
   GLenum   access = ( GL_MAP_WRITE_BIT | ( ( buffer->offset == 0 ) ? GL_MAP_INVALIDATE_BUFFER_BIT : GL_MAP_UNSYNCHRONIZED_BIT | GL_MAP_INVALIDATE_RANGE_BIT ) );
   GLvoid  *ptr = glMapBufferARB( GL_ARRAY_BUFFER, access );

   if ( ptr ) {
      SIMDProcessor->Memcpy( static_cast<byte *>( ptr ), data, size );
      glUnmapBufferARB( GL_ARRAY_BUFFER );
      return buffer;
   } else {
      glBufferSubDataARB( GL_ARRAY_BUFFER, static_cast<GLintptrARB>( buffer->offset ), static_cast<GLsizeiptr>( size ), data );
   }
   return buffer;
}

/*
===========
idVertexCache::AllocFrameTemp

A frame temp allocation must never be allowed to fail due to overflow.
We can't simply sync with the GPU and overwrite what we have, because
there may still be future references to dynamically created surfaces.
===========
*/
vertCache_t *idVertexCache::AllocFrameTemp( void *data, int size ) {
   vertCache_t *block;
   
   if( size <= 0 ) {
      common->Error( "idVertexCache::AllocFrameTemp: size = %i\n", size );
   }
   
   if( dynamicAllocThisFrame + size > frameBytes )   {
      // if we don't have enough room in the temp block, allocate a static block,
      // but immediately free it so it will get freed at the next frame
      tempOverflow = true;
      Alloc( data, size, &block );
      Free( block );
      return block;
   }
   
   // this data is just going on the shared dynamic list
   // if we don't have any remaining unused headers, allocate some more
   if( freeDynamicHeaders.next == &freeDynamicHeaders ) {
      for( int i = 0; i < EXPAND_HEADERS; i++ ) {
         block = headerAllocator.Alloc();
         block->next = freeDynamicHeaders.next;
         block->prev = &freeDynamicHeaders;
         block->next->prev = block;
         block->prev->next = block;
      }
   }
   
   // move it from the freeDynamicHeaders list to the dynamicHeaders list
   block = freeDynamicHeaders.next;
   block->next->prev = block->prev;
   block->prev->next = block->next;
   block->next = dynamicHeaders.next;
   block->prev = &dynamicHeaders;
   block->next->prev = block;
   block->prev->next = block;
   block->size = size;
   block->tag = TAG_TEMP;
   block->indexBuffer = false;
   block->offset = dynamicAllocThisFrame;
   dynamicAllocThisFrame += block->size;
   dynamicCountThisFrame++;
   block->user = NULL;
   block->frameUsed = 0;
   
   // copy the data
   block->virtMem = tempBuffers[listNum]->virtMem;
   block->vbo = tempBuffers[listNum]->vbo;
   
   // mh code start
   if( block->vbo ) {
      BindIndex( GL_ARRAY_BUFFER, block->vbo );      
      // try to get an unsynchronized map if at all possible
      if( glConfig.ARBMapBufferRangeAvailable && r_useArbBufferRange.GetBool() ) {
         // if the buffer has wrapped then we orphan it
         return MapBufferRange( block, data, size );
      } else {
         // if the buffer has wrapped then we orphan it
         return MapBuffer( block, data, size );
      }
   } else if( block->virtMem ) {
      SIMDProcessor->Memcpy( static_cast<byte *>( block->virtMem ) + block->offset, data, size );
   }
   return block;
}

/*
===========
idVertexCache::EndFrame
===========
*/
void idVertexCache::EndFrame() {
   // display debug information
   if( r_showVertexCache.GetBool() ) {
      int staticUseCount = 0;
      int staticUseSize = 0;
      
      for( vertCache_t *block = staticHeaders.next ; block != &staticHeaders ; block = block->next ) {
         if( block->frameUsed == currentFrame ) {
            staticUseCount++;
            staticUseSize += block->size;
         }
      }
      const char *frameOverflow = tempOverflow ? "(OVERFLOW)" : "";
      common->Printf( "vertex dynamic:%i=%ik%s, static alloc:%i=%ik used:%i=%ik total:%i=%ik\n",
                  dynamicCountThisFrame, dynamicAllocThisFrame / 1024, frameOverflow,
                  staticCountThisFrame, staticAllocThisFrame / 1024,
                  staticUseCount, staticUseSize / 1024,
                  staticCountTotal, staticAllocTotal / 1024 );
   }
   
   // unbind vertex buffers so normal virtual memory will be used
   if( !virtualMemory ) {
      UnbindIndex( GL_ARRAY_BUFFER_ARB );
      UnbindIndex( GL_ELEMENT_ARRAY_BUFFER_ARB );
   }
   currentFrame = tr.frameCount;
   listNum = currentFrame % NUM_VERTEX_FRAMES;
   staticAllocThisFrame = 0;
   staticCountThisFrame = 0;
   dynamicAllocThisFrame = 0;
   dynamicCountThisFrame = 0;
   tempOverflow = false;
   
   // free all the deferred free headers
   while( deferredFreeList.next != &deferredFreeList ) {
      ActuallyFree( deferredFreeList.next );
   }
   
   // free all the frame temp headers
   vertCache_t *block = dynamicHeaders.next;
   
   if( block != &dynamicHeaders ) {
      block->prev = &freeDynamicHeaders;
      dynamicHeaders.prev->next = freeDynamicHeaders.next;
      freeDynamicHeaders.next->prev = dynamicHeaders.prev;
      freeDynamicHeaders.next = block;
      dynamicHeaders.next = dynamicHeaders.prev = &dynamicHeaders;
   }
}

/*
=============
idVertexCache::List
=============
*/
void idVertexCache::List( void ) {
   int         numActive = 0;
   int         frameStatic = 0;
   int         totalStatic = 0;
   vertCache_t *block;
   
   for( block = staticHeaders.next; block != &staticHeaders; block = block->next )   {
      numActive++;
      totalStatic += block->size;
      
      if( block->frameUsed == currentFrame ) {
         frameStatic += block->size;
      }
   }
   int   numFreeStaticHeaders = 0;
   
   for( block = freeStaticHeaders.next; block != &freeStaticHeaders; block = block->next )   {
      numFreeStaticHeaders++;
   }
   int   numFreeDynamicHeaders = 0;
   
   for( block = freeDynamicHeaders.next; block != &freeDynamicHeaders; block = block->next ) {
      numFreeDynamicHeaders++;
   }
   common->Printf( "%i dynamic temp buffers of %ik\n", NUM_VERTEX_FRAMES, frameBytes / 1024 );
   common->Printf( "%5i active static headers\n", numActive );
   common->Printf( "%5i free static headers\n", numFreeStaticHeaders );
   common->Printf( "%5i free dynamic headers\n", numFreeDynamicHeaders );
   
   if( !virtualMemory ) {
      common->Printf( "Vertex cache is in ARB_vertex_buffer_object memory (FAST).\n" );
   } else {
      common->Printf( "Vertex cache is in virtual memory (SLOW)\n" );
   }
   common->Printf( "Index buffers are accelerated.\n" );
}

/*
=============
idVertexCache::Show

Barnes,
replaces the broken glconfig string version.
Revelator cannot use glew's function pointers.
=============
*/
void idVertexCache::Show( void ) {
   GLint  mem[4];
   
   if( glewIsSupported( "GL_NVX_gpu_memory_info" ) ) {
      common->Printf( "\nNvidia specific memory info:\n" );
      common->Printf( "\n" );
      glGetIntegerv( GL_GPU_MEMORY_INFO_DEDICATED_VIDMEM_NVX , mem );
      common->Printf( "dedicated video memory %i MB\n", mem[0] >> 10 );
      glGetIntegerv( GL_GPU_MEMORY_INFO_TOTAL_AVAILABLE_MEMORY_NVX , mem );
      common->Printf( "total available memory %i MB\n", mem[0] >> 10 );
      glGetIntegerv( GL_GPU_MEMORY_INFO_CURRENT_AVAILABLE_VIDMEM_NVX , mem );
      common->Printf( "currently unused GPU memory %i MB\n", mem[0] >> 10 );
      glGetIntegerv( GL_GPU_MEMORY_INFO_EVICTION_COUNT_NVX , mem );
      common->Printf( "count of total evictions seen by system %i MB\n", mem[0] >> 10 );
      glGetIntegerv( GL_GPU_MEMORY_INFO_EVICTED_MEMORY_NVX , mem );
      common->Printf( "total video memory evicted %i MB\n", mem[0] >> 10 );
   } else if( glewIsSupported( "GL_ATI_meminfo" ) ) {
      common->Printf( "\nATI/AMD specific memory info:\n" );
      common->Printf( "\n" );
      glGetIntegerv( GL_VBO_FREE_MEMORY_ATI, mem );
      common->Printf( "VBO: total memory free in the pool %i MB\n", mem[0] >> 10 );
      common->Printf( "VBO: largest available free block in the pool %i MB\n", mem[1] >> 10 );
      common->Printf( "VBO: total auxiliary memory free %i MB\n", mem[2] >> 10 );
      common->Printf( "VBO: largest auxiliary free block %i MB\n", mem[3] >> 10 );
      glGetIntegerv( GL_TEXTURE_FREE_MEMORY_ATI, mem );
      common->Printf( "Texture: total memory free in the pool %i MB\n", mem[0] >> 10 );
      common->Printf( "Texture: largest available free block in the pool %i MB\n", mem[1] >> 10 );
      common->Printf( "Texture: total auxiliary memory free %i MB\n", mem[2] >> 10 );
      common->Printf( "Texture: largest auxiliary free block %i MB\n", mem[3] >> 10 );
      glGetIntegerv( GL_RENDERBUFFER_FREE_MEMORY_ATI, mem );
      common->Printf( "RenderBuffer: total memory free in the pool %i MB\n", mem[0] >> 10 );
      common->Printf( "RenderBuffer: largest available free block in the pool %i MB\n", mem[1] >> 10 );
      common->Printf( "RenderBuffer: total auxiliary memory free %i MB\n", mem[2] >> 10 );
      common->Printf( "RenderBuffer: largest auxiliary free block %i MB\n", mem[3] >> 10 );
   } else {
      common->Printf( "MemInfo not availabled for your video card or driver!\n" );
   }
}

/*
=============
idVertexCache::IsFast

just for gfxinfo printing
=============
*/
bool idVertexCache::IsFast() {
   if( virtualMemory )   {
      return false;
   }
   return true;
}


Code: Select all
/*
===========================================================================

Doom 3 GPL Source Code
Copyright (C) 1999-2011 id Software LLC, a ZeniMax Media company.

This file is part of the Doom 3 GPL Source Code ("Doom 3 Source Code").

Doom 3 Source Code is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

Doom 3 Source Code is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with Doom 3 Source Code.  If not, see <http://www.gnu.org/licenses/>.

In addition, the Doom 3 Source Code is also subject to certain additional terms. You should have received a copy of these additional terms immediately following the terms and conditions of the GNU General Public License which accompanied the Doom 3 Source Code.  If not, please request a copy in writing from id Software at the address below.

If you have questions concerning this license or the applicable additional terms, you may contact in writing id Software LLC, c/o ZeniMax Media Inc., Suite 120, Rockville, Maryland 20850 USA.

===========================================================================
*/

// vertex cache calls should only be made by the front end
const int NUM_VERTEX_FRAMES = 2;

typedef enum
{
   TAG_FREE,
   TAG_USED,
   TAG_FIXED,    // for the temp buffers
   TAG_TEMP      // in frame temp area, not static area
} vertBlockTag_t;

typedef struct vertCache_s
{
   GLuint            vbo;
   GLenum            target;
   GLenum            usage;
   void            *virtMem;         // only one of vbo / virtMem will be set
   bool            indexBuffer;      // holds indexes instead of vertexes
   
   int               offset;
   int               size;          // may be larger than the amount asked for, due
   // to round up and minimum fragment sizes
   int               tag;           // a tag of 0 is a free block
   struct vertCache_s  **user;           // will be set to zero when purged
   struct vertCache_s  *next, *prev;     // may be on the static list or one of the frame lists
   int               frameUsed;        // it can't be purged if near the current frame
} vertCache_t;

class idVertexCache
{
public:
   void         Init();
   void         Shutdown();
   
   // just for gfxinfo printing
   bool         IsFast();
   
   // called when vertex programs are enabled or disabled, because
   // the cached data is no longer valid
   void         PurgeAll();
   
   // Tries to allocate space for the given data in fast vertex
   // memory, and copies it over.
   // Alloc does NOT do a touch, which allows purging of things
   // created at level load time even if a frame hasn't passed yet.
   // These allocations can be purged, which will zero the pointer.
   void         Alloc( void *data, int bytes, vertCache_t **buffer, bool indexBuffer = false );
   
   // This will be a real pointer with virtual memory,
   // but it will be an int offset cast to a pointer of ARB_vertex_buffer_object
   void         *Position( vertCache_t *buffer );
   
   // initialize the element array buffers
   void         BindIndex( GLenum target, GLuint vbo );
   
   // if you need to draw something without an indexCache,
   // this must be called to reset GL_ELEMENT_ARRAY_BUFFER_ARB
   void         UnbindIndex( GLenum target );
   
   // MH's MapBufferRange.
   vertCache_t *MapBufferRange( vertCache_t *buffer, void *data, int size );

   // Revelator's MapBuffer for cards that dont cope to well with the above.
   vertCache_t *MapBuffer( vertCache_t *buffer, void *data, int size );

   // automatically freed at the end of the next frame
   // used for specular texture coordinates and gui drawing, which
   // will change every frame.
   // will return NULL if the vertex cache is completely full
   // As with Position(), this may not actually be a pointer you can access.
   vertCache_t    *AllocFrameTemp( void *data, int bytes );
   
   // notes that a buffer is used this frame, so it can't be purged
   // out from under the GPU
   void         Touch( vertCache_t *buffer );
   
   // this block won't have to zero a buffer pointer when it is purged,
   // but it must still wait for the frames to pass, in case the GPU
   // is still referencing it
   void         Free( vertCache_t *buffer );
   
   // updates the counter for determining which temp space to use
   // and which blocks can be purged
   // Also prints debugging info when enabled
   void         EndFrame();
   
   // listVBOMem calls this
   void         List();
   
   // showVBOMem calls this
   void         Show();
   
private:
   void         InitMemoryBlocks( int size );
   void         ActuallyFree( vertCache_t *block );
   
   static idCVar   r_showVertexCache;
   static idCVar   r_useArbBufferRange;
   static idCVar   r_reuseVertexCacheSooner;
   
   int            staticCountTotal;
   int            staticAllocTotal;      // for end of frame purging
   
   int            staticAllocThisFrame;   // debug counter
   int            staticCountThisFrame;
   int            dynamicAllocThisFrame;
   int            dynamicCountThisFrame;
   
   int            currentFrame;         // for purgable block tracking
   int            listNum;            // currentFrame % NUM_VERTEX_FRAMES, determines which tempBuffers to use
   
   bool         virtualMemory;         // not fast stuff
   
   bool         allocatingTempBuffer;   // force GL_STREAM_DRAW_ARB
   
   vertCache_t     *tempBuffers[NUM_VERTEX_FRAMES];    // allocated at startup
   bool         tempOverflow;                  // had to alloc a temp in static memory
   
   idBlockAlloc<vertCache_t, 1024> headerAllocator;
   
   vertCache_t     freeStaticHeaders;         // head of doubly linked list
   vertCache_t     freeDynamicHeaders;     // head of doubly linked list
   vertCache_t     dynamicHeaders;           // head of doubly linked list
   vertCache_t   deferredFreeList;      // head of doubly linked list
   vertCache_t     staticHeaders;         // head of doubly linked list in MRU order, staticHeaders.next is most recently used
   int         frameBytes;        // for each of NUM_VERTEX_FRAMES frames
};

extern   idVertexCache  vertexCache;


As you will probably notice we allways Map the VBO memory now even if MapBufferRange is turned off (just using the old version of it).
Productivity is a state of mind.
User avatar
revelator
 
Posts: 2496
Joined: Thu Jan 24, 2008 12:04 pm
Location: inside tha debugger

Re: Doom3 shadow optimization

Postby revelator » Fri Jun 20, 2014 9:13 am

Image

glsl renderer, looks mighty purty even though its only used for interactions / shadows :)
Productivity is a state of mind.
User avatar
revelator
 
Posts: 2496
Joined: Thu Jan 24, 2008 12:04 pm
Location: inside tha debugger

Re: Doom3 shadow optimization

Postby motorsep » Fri Jun 20, 2014 4:19 pm

Might as well post a comparison screenshot from Doom 3 - I don't really see much difference from the top of my head :/
motorsep
 
Posts: 231
Joined: Wed Aug 02, 2006 11:46 pm
Location: Texas, USA

Re: Doom3 shadow optimization

Postby revelator » Fri Jun 20, 2014 11:12 pm

Ofc it does not look different :) in fact i gone to great lenghts to make it look like the ARB2 version.
Original version was even darker than vanilla and had some weird shading in places.

Keeping it off for now though as it does nasty things to certain gfx mods, atm it only works on unmodified Doom3.
Productivity is a state of mind.
User avatar
revelator
 
Posts: 2496
Joined: Thu Jan 24, 2008 12:04 pm
Location: inside tha debugger

Re: Doom3 shadow optimization

Postby motorsep » Fri Jun 20, 2014 11:18 pm

Lol, why go through all that trouble when it doesn't even work as expected to begin with and ARB2 backend looks better? :)
motorsep
 
Posts: 231
Joined: Wed Aug 02, 2006 11:46 pm
Location: Texas, USA

Re: Doom3 shadow optimization

Postby Spike » Sat Jun 21, 2014 12:58 am

probably because there's only one graphics company that supports any asm extensions. maybe he wants to make a gles2 port?
Spike
 
Posts: 2853
Joined: Fri Nov 05, 2004 3:12 am
Location: UK

Re: Doom3 shadow optimization

Postby motorsep » Sat Jun 21, 2014 1:13 am

AMD dropped it?
motorsep
 
Posts: 231
Joined: Wed Aug 02, 2006 11:46 pm
Location: Texas, USA

Re: Doom3 shadow optimization

Postby Spike » Sat Jun 21, 2014 1:54 am

by asm extensions, I mean extensions to the asm stuff, rather than the asm itself. point is that while asm works, you can't use any of the extra stuff that has since been added to glsl (like geometry shaders etc) as asm on either amd or intel gpus (afaik, I don't have either).
Spike
 
Posts: 2853
Joined: Fri Nov 05, 2004 3:12 am
Location: UK

Re: Doom3 shadow optimization

Postby motorsep » Sat Jun 21, 2014 2:02 am

Doom 3 BFG already does everything and more than what people have tried doing with old Doom 3. Seems like counter productive, unless it's for personal learning experience.
motorsep
 
Posts: 231
Joined: Wed Aug 02, 2006 11:46 pm
Location: Texas, USA

Re: Doom3 shadow optimization

Postby revelator » Sat Jun 21, 2014 5:13 am

Bingo :idea:
Productivity is a state of mind.
User avatar
revelator
 
Posts: 2496
Joined: Thu Jan 24, 2008 12:04 pm
Location: inside tha debugger

PreviousNext

Return to General Programming

Who is online

Users browsing this forum: No registered users and 2 guests