Forum

Doom3 shadow optimization

Discuss programming topics for any language, any source base. If it is programming related but doesn't fit in one of the below categories, it goes here.

Moderator: InsideQC Admins

Doom3 shadow optimization

Postby revelator » Sat Jun 14, 2014 5:08 pm

This was originally from a tutorial for adding openmp support to Doom3.
Sadly it was linux only and the openmp part does not work on windows.
What does work though is here.

changed model interactions to a defered model.
Consolidated a ton of globals into a struct for stencil shadows.

so here we go.

These changes are big so just replace the entire content of tr_stencilshadow.cpp with this.

Code: Select all
#define TRIANGLE_CULLED(p1,p2,p3) (pointCull[p1] & pointCull[p2] & pointCull[p3] & 0x3f)
#define TRIANGLE_CLIPPED(p1,p2,p3) (((pointCull[p1] & pointCull[p2] & pointCull[p3]) & 0xfc0) != 0xfc0)

// an edge that is on the plane is NOT culled
#define EDGE_CULLED(p1,p2) ((pointCull[p1] ^ 0xfc0) & (pointCull[p2] ^ 0xfc0) & 0xfc0)
#define EDGE_CLIPPED(p1,p2) ((pointCull[p1] & pointCull[p2] & 0xfc0) != 0xfc0)

// a point that is on the plane is NOT culled
#define   POINT_CULLED(p1) ((pointCull[p1] & 0xfc0) != 0xfc0)

//#define   LIGHT_CLIP_EPSILON   0.001f
#define   LIGHT_CLIP_EPSILON      0.1f

idPlane   pointLightFrustums[6][6] =
{
   {
      idPlane( 1, 0, 0, 0 ),
      idPlane( 1, 1, 0, 0 ),
      idPlane( 1, -1, 0, 0 ),
      idPlane( 1, 0, 1, 0 ),
      idPlane( 1, 0, -1, 0 ),
      idPlane( -1, 0, 0, 0 ),
   },
   {
      idPlane( -1, 0, 0, 0 ),
      idPlane( -1, 1, 0, 0 ),
      idPlane( -1, -1, 0, 0 ),
      idPlane( -1, 0, 1, 0 ),
      idPlane( -1, 0, -1, 0 ),
      idPlane( 1, 0, 0, 0 ),
   },
   
   {
      idPlane( 0, 1, 0, 0 ),
      idPlane( 0, 1, 1, 0 ),
      idPlane( 0, 1, -1, 0 ),
      idPlane( 1, 1, 0, 0 ),
      idPlane( -1, 1, 0, 0 ),
      idPlane( 0, -1, 0, 0 ),
   },
   {
      idPlane( 0, -1, 0, 0 ),
      idPlane( 0, -1, 1, 0 ),
      idPlane( 0, -1, -1, 0 ),
      idPlane( 1, -1, 0, 0 ),
      idPlane( -1, -1, 0, 0 ),
      idPlane( 0, 1, 0, 0 ),
   },
   
   {
      idPlane( 0, 0, 1, 0 ),
      idPlane( 1, 0, 1, 0 ),
      idPlane( -1, 0, 1, 0 ),
      idPlane( 0, 1, 1, 0 ),
      idPlane( 0, -1, 1, 0 ),
      idPlane( 0, 0, -1, 0 ),
   },
   {
      idPlane( 0, 0, -1, 0 ),
      idPlane( 1, 0, -1, 0 ),
      idPlane( -1, 0, -1, 0 ),
      idPlane( 0, 1, -1, 0 ),
      idPlane( 0, -1, -1, 0 ),
      idPlane( 0, 0, 1, 0 ),
   },
};

int   c_caps, c_sils;

typedef struct
{
   int      frontCapStart;
   int      rearCapStart;
   int      silStart;
   int      end;
} indexRef_t;

// Consolidated all static variables into a struct
// to pass as a state during shadow calculation
typedef struct
{
#define   MAX_CLIP_SIL_EDGES      2048
   int         numClipSilEdges;
   int         clipSilEdges[MAX_CLIP_SIL_EDGES][2];

   // facing will be 0 if forward facing, 1 if backwards facing
   // grabbed with alloca
   byte      *globalFacing;

   // faceCastsShadow will be 1 if the face is in the projection
   // and facing the apropriate direction
   byte      *faceCastsShadow;

   int         *remap;

#define   MAX_SHADOW_INDEXES      0x18000
#define   MAX_SHADOW_VERTS      0x18000
   int         numShadowIndexes;
   glIndex_t   shadowIndexes[MAX_SHADOW_INDEXES];
   int         numShadowVerts;
   idVec4      shadowVerts[MAX_SHADOW_VERTS];
   bool      overflowed;
   bool      callOptimizer;         // call the preprocessor optimizer after clipping occluders
   indexRef_t   indexRef[6];
   int         indexFrustumNumber;      // which shadow generating side of a light the indexRef is for
} stencilRef_t;

/*
===============
PointsOrdered

To make sure the triangulations of the sil edges is consistant,
we need to be able to order two points.  We don't care about how
they compare with any other points, just that when the same two
points are passed in (in either order), they will always specify
the same one as leading.

Currently we need to have separate faces in different surfaces
order the same way, so we must look at the actual coordinates.
If surfaces are ever guaranteed to not have to edge match with
other surfaces, we could just compare indexes.
===============
*/
static bool PointsOrdered( const idVec3 &a, const idVec3 &b )
{
   float   i, j;
   
   // vectors that wind up getting an equal hash value will
   // potentially cause a misorder, which can show as a couple
   // crack pixels in a shadow
   
   // scale by some odd numbers so -8, 8, 8 will not be equal
   // to 8, -8, 8
   
   // in the very rare case that these might be equal, all that would
   // happen is an oportunity for a tiny rasterization shadow crack
   i = a[0] + a[1] * 127 + a[2] * 1023;
   j = b[0] + b[1] * 127 + b[2] * 1023;
   
   return ( bool )( i < j );
}

/*
====================
R_LightProjectionMatrix

====================
*/
void R_LightProjectionMatrix( const idVec3 &origin, const idPlane &rearPlane, idVec4 mat[4] )
{
   idVec4      lv;
   float      lg;
   
   // calculate the homogenious light vector
   lv.x = origin.x;
   lv.y = origin.y;
   lv.z = origin.z;
   lv.w = 1;
   
   lg = rearPlane.ToVec4() * lv;
   
   // outer product
   mat[0][0] = lg - rearPlane[0] * lv[0];
   mat[0][1] = -rearPlane[1] * lv[0];
   mat[0][2] = -rearPlane[2] * lv[0];
   mat[0][3] = -rearPlane[3] * lv[0];
   
   mat[1][0] = -rearPlane[0] * lv[1];
   mat[1][1] = lg - rearPlane[1] * lv[1];
   mat[1][2] = -rearPlane[2] * lv[1];
   mat[1][3] = -rearPlane[3] * lv[1];
   
   mat[2][0] = -rearPlane[0] * lv[2];
   mat[2][1] = -rearPlane[1] * lv[2];
   mat[2][2] = lg - rearPlane[2] * lv[2];
   mat[2][3] = -rearPlane[3] * lv[2];
   
   mat[3][0] = -rearPlane[0] * lv[3];
   mat[3][1] = -rearPlane[1] * lv[3];
   mat[3][2] = -rearPlane[2] * lv[3];
   mat[3][3] = lg - rearPlane[3] * lv[3];
}

/*
===================
R_ProjectPointsToFarPlane

make a projected copy of the even verts into the odd spots
that is on the far light clip plane
===================
*/
static void R_ProjectPointsToFarPlane( stencilRef_t *st, const idRenderEntityLocal *ent, const idRenderLightLocal *light, const idPlane &lightPlaneLocal, int firstShadowVert, int numShadowVerts )
{
   idVec3      lv;
   idVec4      mat[4];
   int         i;
   idVec4      *in;
   
   R_GlobalPointToLocal( ent->modelMatrix, light->globalLightOrigin, lv );
   R_LightProjectionMatrix( lv, lightPlaneLocal, mat );
   
   // make a projected copy of the even verts into the odd spots
   in = &st->shadowVerts[firstShadowVert];
   
   for( i = firstShadowVert; i < numShadowVerts; i += 2, in += 2 )
   {
      float   w, oow;
      
      in[0].w = 1;
      
      w = in->ToVec3() * mat[3].ToVec3() + mat[3][3];
      
      if( w == 0 )
      {
         in[1] = in[0];
         continue;
      }      
      oow = 1.0 / w;

      in[1].x = ( in->ToVec3() * mat[0].ToVec3() + mat[0][3] ) * oow;
      in[1].y = ( in->ToVec3() * mat[1].ToVec3() + mat[1][3] ) * oow;
      in[1].z = ( in->ToVec3() * mat[2].ToVec3() + mat[2][3] ) * oow;
      in[1].w = 1;
   }
}

#define   MAX_CLIPPED_POINTS   20
typedef struct
{
   int      numVerts;
   idVec3   verts[MAX_CLIPPED_POINTS];
   int      edgeFlags[MAX_CLIPPED_POINTS];
} clipTri_t;

/*
=============
R_ChopWinding

Clips a triangle from one buffer to another, setting edge flags
The returned buffer may be the same as inNum if no clipping is done
If entirely clipped away, clipTris[returned].numVerts == 0

I have some worries about edge flag cases when polygons are clipped
multiple times near the epsilon.
=============
*/
static int R_ChopWinding( clipTri_t clipTris[2], int inNum, const idPlane &plane )
{
   clipTri_t   *in, *out;
   float   dists[MAX_CLIPPED_POINTS];
   int      sides[MAX_CLIPPED_POINTS];
   int      counts[3];
   float   dot;
   int      i, j;
   idVec3   *p1, *p2;
   idVec3   mid;
   
   in = &clipTris[inNum];
   out = &clipTris[inNum ^ 1];
   counts[0] = counts[1] = counts[2] = 0;
   
   // determine sides for each point
   for( i = 0; i < in->numVerts; i++ )
   {
      dot = plane.Distance( in->verts[i] );
      dists[i] = dot;
      
      if( dot < -LIGHT_CLIP_EPSILON )
      {
         sides[i] = SIDE_BACK;
      }
      else if( dot > LIGHT_CLIP_EPSILON )
      {
         sides[i] = SIDE_FRONT;
      }
      else
      {
         sides[i] = SIDE_ON;
      }      
      counts[sides[i]]++;
   }
   
   // if none in front, it is completely clipped away
   if( !counts[SIDE_FRONT] )
   {
      in->numVerts = 0;
      return inNum;
   }
   
   if( !counts[SIDE_BACK] )
   {
      return inNum;      // inout stays the same
   }
   
   // avoid wrapping checks by duplicating first value to end
   sides[i] = sides[0];
   dists[i] = dists[0];
   in->verts[in->numVerts] = in->verts[0];
   in->edgeFlags[in->numVerts] = in->edgeFlags[0];
   
   out->numVerts = 0;
   
   for( i = 0; i < in->numVerts; i++ )
   {
      p1 = &in->verts[i];
      
      if( sides[i] != SIDE_BACK )
      {
         out->verts[out->numVerts] = *p1;
         
         if( sides[i] == SIDE_ON && sides[i + 1] == SIDE_BACK )
         {
            out->edgeFlags[out->numVerts] = 1;
         }
         else
         {
            out->edgeFlags[out->numVerts] = in->edgeFlags[i];
         }         
         out->numVerts++;
      }
      
      if( ( sides[i] == SIDE_FRONT && sides[i + 1] == SIDE_BACK )   || ( sides[i] == SIDE_BACK && sides[i + 1] == SIDE_FRONT ) )
      {
         // generate a split point
         p2 = &in->verts[i + 1];
         
         dot = dists[i] / ( dists[i] - dists[i + 1] );
         
         for( j = 0; j < 3; j++ )
         {
            mid[j] = ( *p1 ) [j] + dot * ( ( *p2 ) [j] - ( *p1 ) [j] );
         }         
         out->verts[out->numVerts] = mid;
         
         // set the edge flag
         if( sides[i + 1] != SIDE_FRONT )
         {
            out->edgeFlags[out->numVerts] = 1;
         }
         else
         {
            out->edgeFlags[out->numVerts] = in->edgeFlags[i];
         }         
         out->numVerts++;
      }
   }   
   return inNum ^ 1;
}

/*
===================
R_ClipTriangleToLight

Returns false if nothing is left after clipping
===================
*/
static bool   R_ClipTriangleToLight( stencilRef_t *st, const idVec3 &a, const idVec3 &b, const idVec3 &c, int planeBits, const idPlane frustum[6] )
{
   int         i;
   int         base;
   clipTri_t   pingPong[2], *ct;
   int         p;
   
   pingPong[0].numVerts = 3;
   pingPong[0].edgeFlags[0] = 0;
   pingPong[0].edgeFlags[1] = 0;
   pingPong[0].edgeFlags[2] = 0;
   pingPong[0].verts[0] = a;
   pingPong[0].verts[1] = b;
   pingPong[0].verts[2] = c;
   
   p = 0;
   
   for( i = 0; i < 6; i++ )
   {
      if( planeBits & ( 1 << i ) )
      {
         p = R_ChopWinding( pingPong, p, frustum[i] );
         
         if( pingPong[p].numVerts < 1 )
         {
            return false;
         }
      }
   }   
   ct = &pingPong[p];
   
   // copy the clipped points out to shadowVerts
   if ( st->numShadowVerts + ct->numVerts * 2 > MAX_SHADOW_VERTS )
   {
      st->overflowed = true;
      return false;
   }   
   base = st->numShadowVerts;
   
   for( i = 0; i < ct->numVerts; i++ )
   {
      st->shadowVerts[base + i * 2].ToVec3() = ct->verts[i];
   }   
   st->numShadowVerts += ct->numVerts * 2;
   
   if ( st->numShadowIndexes + 3 * (ct->numVerts - 2) > MAX_SHADOW_INDEXES )
   {
      st->overflowed = true;
      return false;
   }
   
   for( i = 2; i < ct->numVerts; i++ )
   {
      st->shadowIndexes[st->numShadowIndexes++] = base + i * 2;
      st->shadowIndexes[st->numShadowIndexes++] = base + (i - 1) * 2;
      st->shadowIndexes[st->numShadowIndexes++] = base;
   }
   
   // any edges that were created by the clipping process will
   // have a silhouette quad created for it, because it is one
   // of the exterior bounds of the shadow volume
   for( i = 0; i < ct->numVerts; i++ )
   {
      if( ct->edgeFlags[i] )
      {
         if ( st->numClipSilEdges == MAX_CLIP_SIL_EDGES )
         {
            break;
         }         
         st->clipSilEdges[st->numClipSilEdges][0] = base + i * 2;
         
         if( i == ct->numVerts - 1 )
         {
            st->clipSilEdges[st->numClipSilEdges][1] = base;
         }
         else
         {
            st->clipSilEdges[st->numClipSilEdges][1] = base + (i + 1) * 2;
         }         
         st->numClipSilEdges++;
      }
   }   
   return true;
}

/*
===================
R_ClipLineToLight

If neither point is clearly behind the clipping
plane, the edge will be passed unmodified.  A sil edge that
is on a border plane must be drawn.

If one point is clearly clipped by the plane and the
other point is on the plane, it will be completely removed.
===================
*/
static bool R_ClipLineToLight( const idVec3 &a, const idVec3 &b, const idPlane frustum[6], idVec3 &p1, idVec3 &p2 )
{
   float   *clip;
   int      j;
   float   d1, d2;
   float   f;
   
   p1 = a;
   p2 = b;
   
   // clip it
   for( j = 0; j < 6; j++ )
   {
      d1 = frustum[j].Distance( p1 );
      d2 = frustum[j].Distance( p2 );
      
      // if both on or in front, not clipped to this plane
      if( d1 > -LIGHT_CLIP_EPSILON && d2 > -LIGHT_CLIP_EPSILON )
      {
         continue;
      }
      
      // if one is behind and the other isn't clearly in front, the edge is clipped off
      if( d1 <= -LIGHT_CLIP_EPSILON && d2 < LIGHT_CLIP_EPSILON )
      {
         return false;
      }
      
      if( d2 <= -LIGHT_CLIP_EPSILON && d1 < LIGHT_CLIP_EPSILON )
      {
         return false;
      }
      
      // clip it, keeping the negative side
      if( d1 < 0 )
      {
         clip = p1.ToFloatPtr();
      }
      else
      {
         clip = p2.ToFloatPtr();
      }      
      f = d1 / ( d1 - d2 );

      clip[0] = p1[0] + f * ( p2[0] - p1[0] );
      clip[1] = p1[1] + f * ( p2[1] - p1[1] );
      clip[2] = p1[2] + f * ( p2[2] - p1[2] );
   }
   
   return true;   // retain a fragment
}


/*
==================
R_AddClipSilEdges

Add sil edges for each triangle clipped to the side of
the frustum.

Only done for simple projected lights, not point lights.
==================
*/
static void R_AddClipSilEdges( stencilRef_t *st )
{
   int      v1, v2;
   int      v1_back, v2_back;
   int      i;
   
   // don't allow it to overflow
   if ( st->numShadowIndexes + st->numClipSilEdges * 6 > MAX_SHADOW_INDEXES )
   {
      st->overflowed = true;
      return;
   }
   
   for ( i = 0; i < st->numClipSilEdges; i++ )
   {
      v1 = st->clipSilEdges[i][0];
      v2 = st->clipSilEdges[i][1];
      v1_back = v1 + 1;
      v2_back = v2 + 1;
      
      if ( PointsOrdered( st->shadowVerts[v1].ToVec3(), st->shadowVerts[v2].ToVec3() ) )
      {
         st->shadowIndexes[st->numShadowIndexes++] = v1;
         st->shadowIndexes[st->numShadowIndexes++] = v2;
         st->shadowIndexes[st->numShadowIndexes++] = v1_back;
         st->shadowIndexes[st->numShadowIndexes++] = v2;
         st->shadowIndexes[st->numShadowIndexes++] = v2_back;
         st->shadowIndexes[st->numShadowIndexes++] = v1_back;
      }
      else
      {
         st->shadowIndexes[st->numShadowIndexes++] = v1;
         st->shadowIndexes[st->numShadowIndexes++] = v2;
         st->shadowIndexes[st->numShadowIndexes++] = v2_back;
         st->shadowIndexes[st->numShadowIndexes++] = v1;
         st->shadowIndexes[st->numShadowIndexes++] = v2_back;
         st->shadowIndexes[st->numShadowIndexes++] = v1_back;
      }
   }
}

/*
=================
R_AddSilEdges

Add quads from the front points to the projected points
for each silhouette edge in the light
=================
*/
static void R_AddSilEdges( stencilRef_t *st, const srfTriangles_t *tri, unsigned short *pointCull, const idPlane frustum[6] )
{
   int      v1, v2;
   int      i;
   silEdge_t   *sil;
   int      numPlanes;
   
   numPlanes = tri->numIndexes / 3;
   
   // add sil edges for any true silhouette boundaries on the surface
   for( i = 0; i < tri->numSilEdges; i++ )
   {
      sil = tri->silEdges + i;
      
      if( sil->p1 < 0 || sil->p1 > numPlanes || sil->p2 < 0 || sil->p2 > numPlanes )
      {
         common->Error( "Bad sil planes" );
      }
      
      // an edge will be a silhouette edge if the face on one side
      // casts a shadow, but the face on the other side doesn't.
      // "casts a shadow" means that it has some surface in the projection,
      // not just that it has the correct facing direction
      // This will cause edges that are exactly on the frustum plane
      // to be considered sil edges if the face inside casts a shadow.
      if ( !( st->faceCastsShadow[sil->p1] ^ st->faceCastsShadow[sil->p2] ) )
      {
         continue;
      }
      
      // if the edge is completely off the negative side of
      // a frustum plane, don't add it at all.  This can still
      // happen even if the face is visible and casting a shadow
      // if it is partially clipped
      if( EDGE_CULLED( sil->v1, sil->v2 ) )
      {
         continue;
      }
      
      // see if the edge needs to be clipped
      if( EDGE_CLIPPED( sil->v1, sil->v2 ) )
      {
         if ( st->numShadowVerts + 4 > MAX_SHADOW_VERTS )
         {
            st->overflowed = true;
            return;
         }         
         v1 = st->numShadowVerts;
         v2 = v1 + 2;
         
         if (!R_ClipLineToLight( tri->verts[sil->v1].xyz, tri->verts[sil->v2].xyz, frustum, st->shadowVerts[v1].ToVec3(), st->shadowVerts[v2].ToVec3() ) )
         {
            continue;   // clipped away
         }         
         st->numShadowVerts += 4;
      }
      else
      {
         // use the entire edge
         v1 = st->remap[sil->v1];
         v2 = st->remap[sil->v2];
         
         if( v1 < 0 || v2 < 0 )
         {
            common->Error( "R_AddSilEdges: bad remap[]" );
         }
      }
      
      // don't overflow
      if ( st->numShadowIndexes + 6 > MAX_SHADOW_INDEXES )
      {
         st->overflowed = true;
         return;
      }
      
      // we need to choose the correct way of triangulating the silhouette quad
      // consistantly between any two points, no matter which order they are specified.
      // If this wasn't done, slight rasterization cracks would show in the shadow
      // volume when two sil edges were exactly coincident
      if ( st->faceCastsShadow[sil->p2] )
      {
         if ( PointsOrdered( st->shadowVerts[v1].ToVec3(), st->shadowVerts[v2].ToVec3() ) )
         {
            st->shadowIndexes[st->numShadowIndexes++] = v1;
            st->shadowIndexes[st->numShadowIndexes++] = v1 + 1;
            st->shadowIndexes[st->numShadowIndexes++] = v2;
            st->shadowIndexes[st->numShadowIndexes++] = v2;
            st->shadowIndexes[st->numShadowIndexes++] = v1 + 1;
            st->shadowIndexes[st->numShadowIndexes++] = v2 + 1;
         }
         else
         {
            st->shadowIndexes[st->numShadowIndexes++] = v1;
            st->shadowIndexes[st->numShadowIndexes++] = v2 + 1;
            st->shadowIndexes[st->numShadowIndexes++] = v2;
            st->shadowIndexes[st->numShadowIndexes++] = v1;
            st->shadowIndexes[st->numShadowIndexes++] = v1 + 1;
            st->shadowIndexes[st->numShadowIndexes++] = v2 + 1;
         }
      }
      else
      {
         if ( PointsOrdered( st->shadowVerts[v1].ToVec3(), st->shadowVerts[v2].ToVec3() ) )
         {
            st->shadowIndexes[st->numShadowIndexes++] = v1;
            st->shadowIndexes[st->numShadowIndexes++] = v2;
            st->shadowIndexes[st->numShadowIndexes++] = v1 + 1;
            st->shadowIndexes[st->numShadowIndexes++] = v2;
            st->shadowIndexes[st->numShadowIndexes++] = v2 + 1;
            st->shadowIndexes[st->numShadowIndexes++] = v1 + 1;
         }
         else
         {
            st->shadowIndexes[st->numShadowIndexes++] = v1;
            st->shadowIndexes[st->numShadowIndexes++] = v2;
            st->shadowIndexes[st->numShadowIndexes++] = v2 + 1;
            st->shadowIndexes[st->numShadowIndexes++] = v1;
            st->shadowIndexes[st->numShadowIndexes++] = v2 + 1;
            st->shadowIndexes[st->numShadowIndexes++] = v1 + 1;
         }
      }
   }
}

/*
================
R_CalcPointCull

Also inits the remap[] array to all -1
================
*/
static void R_CalcPointCull( stencilRef_t *st, const srfTriangles_t *tri, const idPlane frustum[6], unsigned short *pointCull )
{
   int      i;
   int      frontBits;
   float   *planeSide;
   byte   *side1, *side2;
   
   SIMDProcessor->Memset( st->remap, -1, tri->numVerts * sizeof( st->remap[0] ) );
   
   for( frontBits = 0, i = 0; i < 6; i++ )
   {
      // get front bits for the whole surface
      if( tri->bounds.PlaneDistance( frustum[i] ) >= LIGHT_CLIP_EPSILON )
      {
         frontBits |= 1 << ( i + 6 );
      }
   }
   
   // initialize point cull
   for( i = 0; i < tri->numVerts; i++ )
   {
      pointCull[i] = frontBits;
   }
   
   // if the surface is not completely inside the light frustum
   if( frontBits == ( ( ( 1 << 6 ) - 1 ) ) << 6 )
   {
      return;
   }   
   planeSide = ( float * ) _alloca16( tri->numVerts * sizeof( float ) );
   side1 = ( byte * ) _alloca16( tri->numVerts * sizeof( byte ) );
   side2 = ( byte * ) _alloca16( tri->numVerts * sizeof( byte ) );
   SIMDProcessor->Memset( side1, 0, tri->numVerts * sizeof( byte ) );
   SIMDProcessor->Memset( side2, 0, tri->numVerts * sizeof( byte ) );
   
   for( i = 0; i < 6; i++ )
   {   
      if( frontBits & ( 1 << ( i + 6 ) ) )
      {
         continue;
      }      
      SIMDProcessor->Dot( planeSide, frustum[i], tri->verts, tri->numVerts );
      SIMDProcessor->CmpLT( side1, i, planeSide, LIGHT_CLIP_EPSILON, tri->numVerts );
      SIMDProcessor->CmpGT( side2, i, planeSide, -LIGHT_CLIP_EPSILON, tri->numVerts );
   }
   
   for( i = 0; i < tri->numVerts; i++ )
   {
      pointCull[i] |= side1[i] | ( side2[i] << 6 );
   }
}

/*
=================
R_CreateShadowVolumeInFrustum

Adds new verts and indexes to the shadow volume.

If the frustum completely defines the projected light,
makeClippedPlanes should be true, which will cause sil quads to
be added along all clipped edges.

If the frustum is just part of a point light, clipped planes don't
need to be added.
=================
*/
static void R_CreateShadowVolumeInFrustum( stencilRef_t *st,
      const idRenderEntityLocal *ent,
      const srfTriangles_t *tri,
      const idRenderLightLocal *light,
      const idVec3 lightOrigin,
      const idPlane frustum[6],
      const idPlane &farPlane,
      bool makeClippedPlanes )
{
   int               i;
   int               numTris;
   unsigned short      *pointCull;
   int               numCapIndexes;
   int               firstShadowIndex;
   int               firstShadowVert;
   int               cullBits;
   
   pointCull = ( unsigned short * ) _alloca16( tri->numVerts * sizeof( pointCull[0] ) );
   
   // test the vertexes for inside the light frustum, which will allow
   // us to completely cull away some triangles from consideration.
   R_CalcPointCull( st, tri, frustum, pointCull );
   
   // this may not be the first frustum added to the volume
   firstShadowIndex = st->numShadowIndexes;
   firstShadowVert = st->numShadowVerts;
   
   // decide which triangles front shadow volumes, clipping as needed
   st->numClipSilEdges = 0;
   numTris = tri->numIndexes / 3;
   
   for( i = 0; i < numTris; i++ )
   {
      int      i1, i2, i3;
      
      st->faceCastsShadow[i] = 0;   // until shown otherwise
      
      // if it isn't facing the right way, don't add it
      // to the shadow volume
      if ( st->globalFacing[i] )
      {
         continue;
      }      
      i1 = tri->silIndexes[i * 3 + 0];
      i2 = tri->silIndexes[i * 3 + 1];
      i3 = tri->silIndexes[i * 3 + 2];
      
      // if all the verts are off one side of the frustum,
      // don't add any of them
      if( TRIANGLE_CULLED( i1, i2, i3 ) )
      {
         continue;
      }
      
      // make sure the verts that are not on the negative sides
      // of the frustum are copied over.
      // we need to get the original verts even from clipped triangles
      // so the edges reference correctly, because an edge may be unclipped
      // even when a triangle is clipped.
      if ( st->numShadowVerts + 6 > MAX_SHADOW_VERTS )
      {
         st->overflowed = true;
         return;
      }
      
      if ( !POINT_CULLED(i1) && st->remap[i1] == -1 )
      {
         st->remap[i1] = st->numShadowVerts;
         st->shadowVerts[st->numShadowVerts].ToVec3() = tri->verts[i1].xyz;
         st->numShadowVerts += 2;
      }
      
      if ( !POINT_CULLED(i2) && st->remap[i2] == -1 )
      {
         st->remap[i2] = st->numShadowVerts;
         st->shadowVerts[st->numShadowVerts].ToVec3() = tri->verts[i2].xyz;
         st->numShadowVerts += 2;
      }
      
      if ( !POINT_CULLED(i3) && st->remap[i3] == -1 )
      {
         st->remap[i3] = st->numShadowVerts;
         st->shadowVerts[st->numShadowVerts].ToVec3() = tri->verts[i3].xyz;
         st->numShadowVerts += 2;
      }
      
      // clip the triangle if any points are on the negative sides
      if( TRIANGLE_CLIPPED( i1, i2, i3 ) )
      {
         cullBits = ( ( pointCull[i1] ^ 0xfc0 ) | ( pointCull[i2] ^ 0xfc0 ) | ( pointCull[i3] ^ 0xfc0 ) ) >> 6;
         
         // this will also define clip edges that will become
         // silhouette planes
         if( R_ClipTriangleToLight( st, tri->verts[i1].xyz, tri->verts[i2].xyz, tri->verts[i3].xyz, cullBits, frustum ) )
         {
            st->faceCastsShadow[i] = 1;
         }
      }
      else
      {
         // instead of overflowing or drawing a streamer shadow, don't draw a shadow at all
         if ( st->numShadowIndexes + 3 > MAX_SHADOW_INDEXES )
         {
            st->overflowed = true;
            return;
         }
         
         if ( st->remap[i1] == -1 || st->remap[i2] == -1 || st->remap[i3] == -1 )
         {
            common->Error( "R_CreateShadowVolumeInFrustum: bad remap[]" );
         }         
         st->shadowIndexes[st->numShadowIndexes++] = st->remap[i3];
         st->shadowIndexes[st->numShadowIndexes++] = st->remap[i2];
         st->shadowIndexes[st->numShadowIndexes++] = st->remap[i1];
         st->faceCastsShadow[i] = 1;
      }
   }
   
   // add indexes for the back caps, which will just be reversals of the
   // front caps using the back vertexes
   numCapIndexes = st->numShadowIndexes - firstShadowIndex;
   
   // if no faces have been defined for the shadow volume,
   // there won't be anything at all
   if( numCapIndexes == 0 )
   {
      return;
   }
   
   //--------------- off-line processing ------------------
   
   // if we are running from dmap, perform the (very) expensive shadow optimizations
   // to remove internal sil edges and optimize the caps
   if ( st->callOptimizer )
   {
      optimizedShadow_t opt;
      
      // project all of the vertexes to the shadow plane, generating
      // an equal number of back vertexes
      opt = SuperOptimizeOccluders( st->shadowVerts, st->shadowIndexes + firstShadowIndex, numCapIndexes, farPlane, lightOrigin );
      
      // pull off the non-optimized data
      st->numShadowIndexes = firstShadowIndex;
      st->numShadowVerts = firstShadowVert;
      
      // add the optimized data
      if ( st->numShadowIndexes + opt.totalIndexes > MAX_SHADOW_INDEXES || st->numShadowVerts + opt.numVerts > MAX_SHADOW_VERTS )
      {
         st->overflowed = true;
         common->Printf( "WARNING: overflowed MAX_SHADOW tables, shadow discarded\n" );
         Mem_Free( opt.verts );
         Mem_Free( opt.indexes );
         return;
      }
      
      for( i = 0; i < opt.numVerts; i++ )
      {
         st->shadowVerts[st->numShadowVerts + i][0] = opt.verts[i][0];
         st->shadowVerts[st->numShadowVerts + i][1] = opt.verts[i][1];
         st->shadowVerts[st->numShadowVerts + i][2] = opt.verts[i][2];
         st->shadowVerts[st->numShadowVerts + i][3] = 1;
      }
      
      for( i = 0; i < opt.totalIndexes; i++ )
      {
         int   index = opt.indexes[i];
         
         if( index < 0 || index > opt.numVerts )
         {
            common->Error( "optimized shadow index out of range" );
         }         
         st->shadowIndexes[st->numShadowIndexes + i] = index + st->numShadowVerts;
      }      
      st->numShadowVerts += opt.numVerts;
      st->numShadowIndexes += opt.totalIndexes;
      
      // note the index distribution so we can sort all the caps after all the sils
      st->indexRef[st->indexFrustumNumber].frontCapStart = firstShadowIndex;
      st->indexRef[st->indexFrustumNumber].rearCapStart = firstShadowIndex + opt.numFrontCapIndexes;
      st->indexRef[st->indexFrustumNumber].silStart = firstShadowIndex + opt.numFrontCapIndexes + opt.numRearCapIndexes;
      st->indexRef[st->indexFrustumNumber].end = st->numShadowIndexes;
      st->indexFrustumNumber++;
      
      Mem_Free( opt.verts );
      Mem_Free( opt.indexes );
      return;
   }
   
   //--------------- real-time processing ------------------
   
   // the dangling edge "face" is never considered to cast a shadow,
   // so any face with dangling edges that casts a shadow will have
   // it's dangling sil edge trigger a sil plane
   st->faceCastsShadow[numTris] = 0;
   
   // instead of overflowing or drawing a streamer shadow, don't draw a shadow at all
   // if we ran out of space
   if ( st->numShadowIndexes + numCapIndexes > MAX_SHADOW_INDEXES )
   {
      st->overflowed = true;
      return;
   }
   
   for( i = 0; i < numCapIndexes; i += 3 )
   {
      st->shadowIndexes[st->numShadowIndexes + i + 0] = st->shadowIndexes[firstShadowIndex + i + 2] + 1;
      st->shadowIndexes[st->numShadowIndexes + i + 1] = st->shadowIndexes[firstShadowIndex + i + 1] + 1;
      st->shadowIndexes[st->numShadowIndexes + i + 2] = st->shadowIndexes[firstShadowIndex + i + 0] + 1;
   }   
   st->numShadowIndexes += numCapIndexes;
   
   c_caps += numCapIndexes * 2;
   
   int preSilIndexes = st->numShadowIndexes;
   
   // if any triangles were clipped, we will have a list of edges
   // on the frustum which must now become sil edges
   if( makeClippedPlanes )
   {
      R_AddClipSilEdges( st );
   }
   
   // any edges that are a transition between a shadowing and
   // non-shadowing triangle will cast a silhouette edge
   R_AddSilEdges( st, tri, pointCull, frustum );
   
   c_sils += st->numShadowIndexes - preSilIndexes;
   
   // project all of the vertexes to the shadow plane, generating
   // an equal number of back vertexes
   R_ProjectPointsToFarPlane( st, ent, light, farPlane, firstShadowVert, st->numShadowVerts );
   
   // note the index distribution so we can sort all the caps after all the sils
   st->indexRef[st->indexFrustumNumber].frontCapStart = firstShadowIndex;
   st->indexRef[st->indexFrustumNumber].rearCapStart = firstShadowIndex + numCapIndexes;
   st->indexRef[st->indexFrustumNumber].silStart = preSilIndexes;
   st->indexRef[st->indexFrustumNumber].end = st->numShadowIndexes;
   st->indexFrustumNumber++;
}

/*
===================
R_MakeShadowFrustums

Called at definition derivation time
===================
*/
void R_MakeShadowFrustums( idRenderLightLocal *light )
{
   int      i, j;
   
   if( light->parms.pointLight )
   {
      // exact projection,taking into account asymetric frustums when
      // globalLightOrigin isn't centered      
      static int   faceCorners[6][4] =
      {
         { 7, 5, 1, 3 },      // positive X side
         { 4, 6, 2, 0 },      // negative X side
         { 6, 7, 3, 2 },      // positive Y side
         { 5, 4, 0, 1 },      // negative Y side
         { 6, 4, 5, 7 },      // positive Z side
         { 3, 1, 0, 2 }      // negative Z side
      };
      static int   faceEdgeAdjacent[6][4] =
      {
         { 4, 4, 2, 2 },      // positive X side
         { 7, 7, 1, 1 },      // negative X side
         { 5, 5, 0, 0 },      // positive Y side
         { 6, 6, 3, 3 },      // negative Y side
         { 0, 0, 3, 3 },      // positive Z side
         { 5, 5, 6, 6 }      // negative Z side
      };
      
      bool   centerOutside = false;
      
      // if the light center of projection is outside the light bounds,
      // we will need to build the planes a little differently
      if( fabs( light->parms.lightCenter[0] ) > light->parms.lightRadius[0] ||
         fabs( light->parms.lightCenter[1] ) > light->parms.lightRadius[1] ||
         fabs( light->parms.lightCenter[2] ) > light->parms.lightRadius[2] )
      {
         centerOutside = true;
      }
      
      // make the corners
      idVec3   corners[8];
      
      for( i = 0; i < 8; i++ )
      {
         idVec3   temp;
      
         for( j = 0; j < 3; j++ )
         {
            if( i & ( 1 << j ) )
            {
               temp[j] = light->parms.lightRadius[j];
            }
            else
            {
               temp[j] = -light->parms.lightRadius[j];
            }
         }
      
         // transform to global space
         corners[i] = light->parms.origin + light->parms.axis * temp;
      }      
      light->numShadowFrustums = 0;
      
      for( int side = 0; side < 6; side++ )
      {
         shadowFrustum_t   *frust = &light->shadowFrustums[light->numShadowFrustums];
         idVec3 &p1 = corners[faceCorners[side][0]];
         idVec3 &p2 = corners[faceCorners[side][1]];
         idVec3 &p3 = corners[faceCorners[side][2]];
         idPlane backPlane;
      
         // plane will have positive side inward
         backPlane.FromPoints( p1, p2, p3 );
      
         // if center of projection is on the wrong side, skip
         float d = backPlane.Distance( light->globalLightOrigin );
      
         if( d < 0 )
         {
            continue;
         }      
         frust->numPlanes = 6;
         frust->planes[5] = backPlane;
         frust->planes[4] = backPlane;   // we don't really need the extra plane
      
         // make planes with positive side facing inwards in light local coordinates
         for( int edge = 0; edge < 4; edge++ )
         {
            idVec3 &p1 = corners[faceCorners[side][edge]];
            idVec3 &p2 = corners[faceCorners[side][( edge + 1 ) & 3]];
      
            // create a plane that goes through the center of projection
            frust->planes[edge].FromPoints( p2, p1, light->globalLightOrigin );
      
            // see if we should use an adjacent plane instead
            if( centerOutside )
            {
               idVec3 &p3 = corners[faceEdgeAdjacent[side][edge]];
               idPlane sidePlane;
      
               sidePlane.FromPoints( p2, p1, p3 );
               d = sidePlane.Distance( light->globalLightOrigin );
      
               if( d < 0 )
               {
                  // use this plane instead of the edged plane
                  frust->planes[edge] = sidePlane;
               }
      
               // we can't guarantee a neighbor, so add sill planes at edge
               light->shadowFrustums[light->numShadowFrustums].makeClippedPlanes = true;
            }
         }      
         light->numShadowFrustums++;
      }
      return;
   }
   
   // projected light   
   light->numShadowFrustums = 1;
   shadowFrustum_t   *frust = &light->shadowFrustums[0];
   
   // flip and transform the frustum planes so the positive side faces
   // inward in local coordinates
   
   // it is important to clip against even the near clip plane, because
   // many projected lights that are faking area lights will have their
   // origin behind solid surfaces.
   for( i = 0; i < 6; i++ )
   {
      idPlane &plane = frust->planes[i];
      
      plane.SetNormal( -light->frustum[i].Normal() );
      plane.SetDist( -light->frustum[i].Dist() );
   }   
   frust->numPlanes = 6;
   
   frust->makeClippedPlanes = true;
   // projected lights don't have shared frustums, so any clipped edges
   // right on the planes must have a sil plane created for them
}

/*
=================
R_CreateShadowVolume

The returned surface will have a valid bounds and radius for culling.

Triangles are clipped to the light frustum before projecting.

A single triangle can clip to as many as 7 vertexes, so
the worst case expansion is 2*(numindexes/3)*7 verts when counting both
the front and back caps, although it will usually only be a modest
increase in vertexes for closed modesl

The worst case index count is much larger, when the 7 vertex clipped triangle
needs 15 indexes for the front, 15 for the back, and 42 (a quad on seven sides)
for the sides, for a total of 72 indexes from the original 3.  Ouch.

NULL may be returned if the surface doesn't create a shadow volume at all,
as with a single face that the light is behind.

If an edge is within an epsilon of the border of the volume, it must be treated
as if it is clipped for triangles, generating a new sil edge, and act
as if it was culled for edges, because the sil edge will have been
generated by the triangle irregardless of if it actually was a sil edge.
=================
*/
srfTriangles_t *R_CreateShadowVolume( const idRenderEntityLocal *ent, const srfTriangles_t *tri, const idRenderLightLocal *light, shadowGen_t optimize, srfCullInfo_t &cullInfo )
{
   int            i, j;
   idVec3         lightOrigin;
   srfTriangles_t   *newTri;
   int            capPlaneBits;
   
   if( !r_shadows.GetBool() )
   {
      return NULL;
   }
   
   if( tri->numSilEdges == 0 || tri->numIndexes == 0 || tri->numVerts == 0 )
   {
      return NULL;
   }
   
   if( tri->numIndexes < 0 )
   {
      common->Error( "R_CreateShadowVolume: tri->numIndexes = %i", tri->numIndexes );
   }
   
   if( tri->numVerts < 0 )
   {
      common->Error( "R_CreateShadowVolume: tri->numVerts = %i", tri->numVerts );
   }   
   tr.pc.c_createShadowVolumes++;
   
   // use the fast infinite projection in dynamic situations, which
   // trades somewhat more overdraw and no cap optimizations for
   // a very simple generation process
   if( optimize == SG_DYNAMIC && r_useTurboShadow.GetBool() )
   {
      return R_CreateVertexProgramTurboShadowVolume( ent, tri, light, cullInfo );
   }   
   R_CalcInteractionFacing( ent, tri, light, cullInfo );
   
   int numFaces = tri->numIndexes / 3;
   int allFront = 1;
   
   for( i = 0; i < numFaces && allFront; i++ )
   {
      allFront &= cullInfo.facing[i];
   }
   
   if( allFront )
   {
      // if no faces are the right direction, don't make a shadow at all
      return NULL;
   }
   stencilRef_t *st = ( stencilRef_t * )_alloca( sizeof( stencilRef_t ) );
   
   // clear the shadow volume
   st->numShadowIndexes = 0;
   st->numShadowVerts = 0;
   st->overflowed = false;
   st->indexFrustumNumber = 0;
   capPlaneBits = 0;
   st->callOptimizer = (optimize == SG_OFFLINE);
   
   // the facing information will be the same for all six projections
   // from a point light, as well as for any directed lights
   st->globalFacing = cullInfo.facing;
   st->faceCastsShadow = (byte *)_alloca16(tri->numIndexes / 3 + 1);   // + 1 for fake dangling edge face
   st->remap = ( int * )_alloca16( tri->numVerts * sizeof( st->remap[0] ) );
   
   R_GlobalPointToLocal( ent->modelMatrix, light->globalLightOrigin, lightOrigin );
   
   // run through all the shadow frustums, which is one for a projected light,
   // and usually six for a point light, but point lights with centers outside
   // the box may have less
   for( int frustumNum = 0; frustumNum < light->numShadowFrustums; frustumNum++ )
   {
      const shadowFrustum_t   *frust = &light->shadowFrustums[frustumNum];

      ALIGN16( idPlane frustum[6] );
      
      // transform the planes into entity space
      // we could share and reverse some of the planes between frustums for a minor
      // speed increase
      
      // the cull test is redundant for a single shadow frustum projected light, because
      // the surface has already been checked against the main light frustums      
      for( j = 0; j < frust->numPlanes; j++ )
      {
         R_GlobalPlaneToLocal( ent->modelMatrix, frust->planes[j], frustum[j] );
         
         // try to cull the entire surface against this frustum
         float d = tri->bounds.PlaneDistance( frustum[j] );
         
         if( d < -LIGHT_CLIP_EPSILON )
         {
            break;
         }
      }
      
      if( j != frust->numPlanes )
      {
         continue;
      }
      
      // we need to check all the triangles
      int   oldFrustumNumber = st->indexFrustumNumber;
      
      R_CreateShadowVolumeInFrustum( st, ent, tri, light, lightOrigin, frustum, frustum[5], frust->makeClippedPlanes );
      
      // if we couldn't make a complete shadow volume, it is better to
      // not draw one at all, avoiding streamer problems
      if ( st->overflowed )
      {
         return NULL;
      }
      
      if ( st->indexFrustumNumber != oldFrustumNumber )
      {
         // note that we have caps projected against this frustum,
         // which may allow us to skip drawing the caps if all projected
         // planes face away from the viewer and the viewer is outside the light volume
         capPlaneBits |= 1 << frustumNum;
      }
   }
   
   // if no faces have been defined for the shadow volume,
   // there won't be anything at all
   if ( st->numShadowIndexes == 0 )
   {
      return NULL;
   }
   
   // this should have been prevented by the overflowed flag, so if it ever happens,
   // it is a code error
   if ( st->numShadowVerts > MAX_SHADOW_VERTS || st->numShadowIndexes > MAX_SHADOW_INDEXES )
   {
      common->FatalError( "Shadow volume exceeded allocation" );
   }
   
   // allocate a new surface for the shadow volume
   newTri = R_AllocStaticTriSurf();
   
   // we might consider setting this, but it would only help for
   // large lights that are partially off screen
   newTri->bounds.Clear();
   
   // copy off the verts and indexes
   newTri->numVerts = st->numShadowVerts;
   newTri->numIndexes = st->numShadowIndexes;
   
   // the shadow verts will go into a main memory buffer as well as a vertex
   // cache buffer, so they can be copied back if they are purged
   R_AllocStaticTriSurfShadowVerts( newTri, newTri->numVerts );
   memcpy( newTri->shadowVertexes, st->shadowVerts, newTri->numVerts * sizeof( newTri->shadowVertexes[0] ) );
   
   R_AllocStaticTriSurfIndexes( newTri, newTri->numIndexes );
   
   /* sortCapIndexes */
   if( 1 )
   {
      newTri->shadowCapPlaneBits = capPlaneBits;
      
      // copy the sil indexes first
      newTri->numShadowIndexesNoCaps = 0;
      
      for ( i = 0; i < st->indexFrustumNumber; i++ )
      {
         int   c = st->indexRef[i].end - st->indexRef[i].silStart;
         memcpy( newTri->indexes + newTri->numShadowIndexesNoCaps, st->shadowIndexes + st->indexRef[i].silStart, c * sizeof( newTri->indexes[0] ) );
         newTri->numShadowIndexesNoCaps += c;
      }
      
      // copy rear cap indexes next
      newTri->numShadowIndexesNoFrontCaps = newTri->numShadowIndexesNoCaps;
      
      for ( i = 0; i < st->indexFrustumNumber; i++ )
      {
         int   c = st->indexRef[i].silStart - st->indexRef[i].rearCapStart;
         memcpy( newTri->indexes + newTri->numShadowIndexesNoFrontCaps, st->shadowIndexes + st->indexRef[i].rearCapStart, c * sizeof( newTri->indexes[0] ) );
         newTri->numShadowIndexesNoFrontCaps += c;
      }
      
      // copy front cap indexes last
      newTri->numIndexes = newTri->numShadowIndexesNoFrontCaps;
      
      for ( i = 0; i < st->indexFrustumNumber; i++ )
      {
         int   c = st->indexRef[i].rearCapStart - st->indexRef[i].frontCapStart;
         memcpy( newTri->indexes + newTri->numIndexes, st->shadowIndexes + st->indexRef[i].frontCapStart, c * sizeof( newTri->indexes[0] ) );
         newTri->numIndexes += c;
      }
      
   }
   else
   {
      newTri->shadowCapPlaneBits = 63;   // we don't have optimized index lists
      memcpy( newTri->indexes, st->shadowIndexes, newTri->numIndexes * sizeof( newTri->indexes[0] ) );
   }
   
   if( optimize == SG_OFFLINE )
   {
      CleanupOptimizedShadowTris( newTri );
   }   
   return newTri;
}


thats it for stencil shadows.

ok lets do the defered thingy.

in tr_light.cpp find R_AddModelSurfaces and replace it with this ->

Code: Select all
/*
===================
R_AddModelSurfaces

Here is where dynamic models actually get instantiated, and necessary
interactions get created.  This is all done on a sort-by-model basis
to keep source data in cache (most likely L2) as any interactions and
shadows are generated, since dynamic models will typically be lit by
two or more lights.

Revelator changed to a defered model,
was actually from an openmp optimization tutorial for linux,
but the openmp optimizer calls dont work on windows because they used posix threads,
(might work if we port this to mingw64).
===================
*/
void R_AddModelSurfaces( void )
{
#define MAX_INTER 1000
   viewEntity_t      *vEntity;
   idRenderModel      *model;
   idInteraction      *inter, *next;
   idInteraction      *interactions[MAX_INTER];
   idRenderModel      *createInteractionModel[MAX_INTER];
   idRenderModel      *interactionModelPtr[MAX_INTER];
   idScreenRect      shadowScissor[MAX_INTER];
   bool            interactionPhase2[MAX_INTER];
   int               createInteractionId[MAX_INTER];
   int               nInteractions = 0;
   int               nCreateInteractions = 0;
   
   // clear the ambient surface list
   tr.viewDef->numDrawSurfs = 0;
   tr.viewDef->maxDrawSurfs = 0;   // will be set to INITIAL_DRAWSURFS on R_AddDrawSurf
   
   // go through each entity that is either visible to the view, or to
   // any light that intersects the view (for shadows)
   for( vEntity = tr.viewDef->viewEntitys; vEntity; vEntity = vEntity->next )
   {
      if( r_useEntityScissors.GetBool() )
      {
         // calculate the screen area covered by the entity
         idScreenRect scissorRect = R_CalcEntityScissorRectangle( vEntity );
         
         // intersect with the portal crossing scissor rectangle
         vEntity->scissorRect.Intersect( scissorRect );
         
         if( r_showEntityScissors.GetBool() )
         {
            R_ShowColoredScreenRect( vEntity->scissorRect, vEntity->entityDef->index );
         }
      }
      float   oldFloatTime;
      int      oldTime;
      
      game->SelectTimeGroup( vEntity->entityDef->parms.timeGroup );
      
      if( vEntity->entityDef->parms.timeGroup )
      {
         oldFloatTime = tr.viewDef->floatTime;
         oldTime = tr.viewDef->renderView.time;
         
         tr.viewDef->floatTime = game->GetTimeGroupTime( vEntity->entityDef->parms.timeGroup ) * 0.001;
         tr.viewDef->renderView.time = game->GetTimeGroupTime( vEntity->entityDef->parms.timeGroup );
      }
      
      if( tr.viewDef->isXraySubview && vEntity->entityDef->parms.xrayIndex == 1 )
      {
         if( vEntity->entityDef->parms.timeGroup )
         {
            tr.viewDef->floatTime = oldFloatTime;
            tr.viewDef->renderView.time = oldTime;
         }
         continue;
      }
      else if( !tr.viewDef->isXraySubview && vEntity->entityDef->parms.xrayIndex == 2 )
      {
         if( vEntity->entityDef->parms.timeGroup )
         {
            tr.viewDef->floatTime = oldFloatTime;
            tr.viewDef->renderView.time = oldTime;
         }
         continue;
      }
      
      // add the ambient surface if it has a visible rectangle
      if( !vEntity->scissorRect.IsEmpty() )
      {
         model = R_EntityDefDynamicModel( vEntity->entityDef );
         
         if( model == NULL || model->NumSurfaces() <= 0 )
         {
            if( vEntity->entityDef->parms.timeGroup )
            {
               tr.viewDef->floatTime = oldFloatTime;
               tr.viewDef->renderView.time = oldTime;
            }
            continue;
         }
         R_AddAmbientDrawsurfs( vEntity );
         tr.pc.c_visibleViewEntities++;
      }
      else
      {
         tr.pc.c_shadowViewEntities++;
      }
      
      // for all the entity / light interactions on this entity, add them to the view
      if( tr.viewDef->isXraySubview )
      {
         if( vEntity->entityDef->parms.xrayIndex == 2 )
         {
            for( inter = vEntity->entityDef->firstInteraction; inter != NULL && !inter->IsEmpty(); inter = next )
            {
               next = inter->entityNext;
               
               if ( inter->lightDef->viewCount != tr.viewCount )
               {
                  continue;
               }
               interactions[nInteractions++] = inter;
               assert(nInteractions <= MAX_INTER);
            }
         }
      }
      else
      {
         // all empty interactions are at the end of the list so once the
         // first is encountered all the remaining interactions are empty
         for( inter = vEntity->entityDef->firstInteraction; inter != NULL && !inter->IsEmpty(); inter = next )
         {
            next = inter->entityNext;
            
            // skip any lights that aren't currently visible
            // this is run after any lights that are turned off have already
            // been removed from the viewLights list, and had their viewCount cleared
            if ( inter->lightDef->viewCount != tr.viewCount )
            {
               continue;
            }
            interactions[nInteractions++] = inter;
            assert(nInteractions <= MAX_INTER);
         }
      }
      
      if( vEntity->entityDef->parms.timeGroup )
      {
         tr.viewDef->floatTime = oldFloatTime;
         tr.viewDef->renderView.time = oldTime;
      }
   }
   int   i, j;

   for (i = 0; i < nInteractions; i++)
   {
      interactionPhase2[i] = interactions[i]->AddActiveInteraction(true, &shadowScissor[i], &interactionModelPtr[i]);

      if (interactionModelPtr[i])
      {
         createInteractionId[nCreateInteractions] = i;
         createInteractionModel[nCreateInteractions] = interactionModelPtr[i];
         nCreateInteractions++;
      }
   }

   for (j = 0; j < nCreateInteractions; j++)
   {
      interactions[createInteractionId[j]]->CreateInteraction(createInteractionModel[j]);
   }

   for (i = 0; i < nInteractions; i++)
   {
      if (interactionPhase2[i])
      {
         interactions[i]->AddActiveInteraction(false, &shadowScissor[i], &interactionModelPtr[i]);
      }
   }
}


in interaction.cpp find idInteraction::AddActiveInteraction and replace it with this.

Code: Select all
/*
==================
idInteraction::AddActiveInteraction

Create and add any necessary light and shadow triangles

If the model doesn't have any surfaces that need interactions
with this type of light, it can be skipped, but we might need to
instantiate the dynamic model to find out
==================
*/
bool idInteraction::AddActiveInteraction(bool interactionPhase1, idScreenRect *shadowScissor, idRenderModel **modelRef)
{
   viewLight_t      *vLight;
   viewEntity_t   *vEntity;
   idScreenRect   lightScissor;
   idVec3         localLightOrigin;
   idVec3         localViewOrigin;

   vLight = lightDef->viewLight;
   vEntity = entityDef->viewEntity;

   if ( interactionPhase1 )
   {
      *modelRef = NULL;

      // do not waste time culling the interaction frustum if there will be no shadows
      if ( !HasShadows() )
      {
         // use the entity scissor rectangle
         *shadowScissor = vEntity->scissorRect;
         // culling does not seem to be worth it for static world models
      }
      else if ( entityDef->parms.hModel->IsStaticWorldModel() )
      {
         // use the light scissor rectangle
         *shadowScissor = vLight->scissorRect;
      }
      else
      {
         // try to cull the interaction
         // this will also cull the case where the light origin is inside the
         // view frustum and the entity bounds are outside the view frustum
         if ( CullInteractionByViewFrustum( tr.viewDef->viewFrustum ) )
         {
            return false;
         }

         // calculate the shadow scissor rectangle
         *shadowScissor = CalcInteractionScissorRectangle( tr.viewDef->viewFrustum );
      }

      // get out before making the dynamic model if the shadow scissor rectangle is empty
      if ( (*shadowScissor).IsEmpty() )
      {
         return false;
      }

      // We will need the dynamic surface created to make interactions, even if the
      // model itself wasn't visible.  This just returns a cached value after it
      // has been generated once in the view.
      idRenderModel *model = R_EntityDefDynamicModel( entityDef );

      if (model == NULL || model->NumSurfaces() <= 0)
      {
         return false;
      }

      // the dynamic model may have changed since we built the surface list
      if ( !IsDeferred() && entityDef->dynamicModelFrameCount != dynamicModelFrameCount )
      {
         FreeSurfaces();
      }
      dynamicModelFrameCount = entityDef->dynamicModelFrameCount;

      // actually create the interaction if needed, building light and shadow surfaces as needed
      if ( IsDeferred() )
      {
         *modelRef = model;
      }
      return true;
   }
   R_GlobalPointToLocal( vEntity->modelMatrix, lightDef->globalLightOrigin, localLightOrigin );
   R_GlobalPointToLocal( vEntity->modelMatrix, tr.viewDef->renderView.vieworg, localViewOrigin );

   // calculate the scissor as the intersection of the light and model rects
   // this is used for light triangles, but not for shadow triangles
   lightScissor = vLight->scissorRect;
   lightScissor.Intersect(vEntity->scissorRect);

   bool lightScissorsEmpty = lightScissor.IsEmpty();

   // for each surface of this entity / light interaction
   for ( int i = 0; i < numSurfaces; i++ )
   {
      surfaceInteraction_t *sint = &surfaces[i];

      // see if the base surface is visible, we may still need to add shadows even if empty
      if ( !lightScissorsEmpty && sint->ambientTris && sint->ambientTris->ambientViewCount == tr.viewCount )
      {
         // make sure we have created this interaction, which may have been deferred
         // on a previous use that only needed the shadow
         if ( sint->lightTris == LIGHT_TRIS_DEFERRED )
         {
            sint->lightTris = R_CreateLightTris(vEntity->entityDef, sint->ambientTris, vLight->lightDef, sint->shader, sint->cullInfo);
            R_FreeInteractionCullInfo(sint->cullInfo);
         }
         srfTriangles_t *lightTris = sint->lightTris;

         if ( lightTris )
         {
            // try to cull before adding
            // FIXME: this may not be worthwhile. We have already done culling on the ambient,
            // but individual surfaces may still be cropped somewhat more
            if ( !R_CullLocalBox( lightTris->bounds, vEntity->modelMatrix, 5, tr.viewDef->frustum ) )
            {
               // make sure the original surface has its ambient cache created
               srfTriangles_t *tri = sint->ambientTris;

               if ( !tri->ambientCache )
               {
                  if ( !R_CreateAmbientCache( tri, sint->shader->ReceivesLighting() ) )
                  {
                     // skip if we were out of vertex memory
                     continue;
                  }
               }

               // reference the original surface's ambient cache
               lightTris->ambientCache = tri->ambientCache;

               // touch the ambient surface so it won't get purged
               vertexCache.Touch(lightTris->ambientCache);

               if ( !lightTris->indexCache )
               {
                  vertexCache.Alloc( lightTris->indexes, lightTris->numIndexes * sizeof(lightTris->indexes[0]), &lightTris->indexCache, true );
               }

               if ( lightTris->indexCache )
               {
                  vertexCache.Touch( lightTris->indexCache );
               }

               // add the surface to the light list
               const idMaterial *shader = sint->shader;

               R_GlobalShaderOverride( &shader );

               // there will only be localSurfaces if the light casts shadows and
               // there are surfaces with NOSELFSHADOW
               if ( sint->shader->Coverage() == MC_TRANSLUCENT )
               {
                  R_LinkLightSurf( &vLight->translucentInteractions, lightTris, vEntity, lightDef, shader, lightScissor, false );
               }
               else if ( !lightDef->parms.noShadows && sint->shader->TestMaterialFlag( MF_NOSELFSHADOW ) )
               {
                  R_LinkLightSurf( &vLight->localInteractions, lightTris, vEntity, lightDef, shader, lightScissor, false );
               }
               else
               {
                  R_LinkLightSurf( &vLight->globalInteractions, lightTris, vEntity, lightDef, shader, lightScissor, false );
               }
            }
         }
      }
      srfTriangles_t *shadowTris = sint->shadowTris;

      // the shadows will always have to be added, unless we can tell they
      // are from a surface in an unconnected area
      if ( shadowTris )
      {
         // check for view specific shadow suppression (player shadows, etc)
         if ( !r_skipSuppress.GetBool() )
         {
            if ( entityDef->parms.suppressShadowInViewID && entityDef->parms.suppressShadowInViewID == tr.viewDef->renderView.viewID )
            {
               continue;
            }

            if ( entityDef->parms.suppressShadowInLightID && entityDef->parms.suppressShadowInLightID == lightDef->parms.lightId )
            {
               continue;
            }
         }

         // cull static shadows that have a non-empty bounds
         // dynamic shadows that use the turboshadow code will not have valid
         // bounds, because the perspective projection extends them to infinity
         if ( r_useShadowCulling.GetBool() && !shadowTris->bounds.IsCleared() )
         {
            if ( R_CullLocalBox( shadowTris->bounds, vEntity->modelMatrix, 5, tr.viewDef->frustum ) )
            {
               continue;
            }
         }

         // copy the shadow vertexes to the vertex cache if they have been purged
         // if we are using shared shadowVertexes and letting a vertex program fix them up,
         // get the shadowCache from the parent ambient surface
         if ( !shadowTris->shadowVertexes )
         {
            // the data may have been purged, so get the latest from the "home position"
            shadowTris->shadowCache = sint->ambientTris->shadowCache;
         }

         // if we have been purged, re-upload the shadowVertexes
         if ( !shadowTris->shadowCache )
         {
            if ( shadowTris->shadowVertexes )
            {
               // each interaction has unique vertexes
               R_CreatePrivateShadowCache( shadowTris );
            }
            else
            {
               R_CreateVertexProgramShadowCache( sint->ambientTris );
               shadowTris->shadowCache = sint->ambientTris->shadowCache;
            }

            // if we are out of vertex cache space, skip the interaction
            if ( !shadowTris->shadowCache )
            {
               continue;
            }
         }

         // touch the shadow surface so it won't get purged
         vertexCache.Touch( shadowTris->shadowCache );

         if ( !shadowTris->indexCache )
         {
            vertexCache.Alloc( shadowTris->indexes, shadowTris->numIndexes * sizeof( shadowTris->indexes[0] ), &shadowTris->indexCache, true );
            vertexCache.Touch( shadowTris->indexCache );
         }

         // see if we can avoid using the shadow volume caps
         bool inside = R_PotentiallyInsideInfiniteShadow( sint->ambientTris, localViewOrigin, localLightOrigin );

         if ( sint->shader->TestMaterialFlag( MF_NOSELFSHADOW ) )
         {
            R_LinkLightSurf( &vLight->localShadows, shadowTris, vEntity, lightDef, NULL, *shadowScissor, inside );
         }
         else
         {
            R_LinkLightSurf( &vLight->globalShadows, shadowTris, vEntity, lightDef, NULL, *shadowScissor, inside );
         }
      }
   }
   return true;
}


also change it in interaction.h to this ->

Code: Select all
   // makes sure all necessary light surfaces and shadow surfaces are created, and
   // calls R_LinkLightSurf () for each one
   bool               AddActiveInteraction(bool interactionPhase1, idScreenRect *shadowScissor, idRenderModel **modelRef);


and move CreateInteraction from below private: to just below the above (private classes are not visible by default to the functions using them so we need it in the public section).

so we end up with a class looking like this ->

Code: Select all
class idInteraction
{
public:
   // this may be 0 if the light and entity do not actually intersect
   // -1 = an untested interaction
   int                  numSurfaces;
   
   // if there is a whole-entity optimized shadow hull, it will
   // be present as a surfaceInteraction_t with a NULL ambientTris, but
   // possibly having a shader to specify the shadow sorting order
   surfaceInteraction_t    *surfaces;
   
   // get space from here, if NULL, it is a pre-generated shadow volume from dmap
   idRenderEntityLocal    *entityDef;
   idRenderLightLocal    *lightDef;
   
   idInteraction          *lightNext;            // for lightDef chains
   idInteraction          *lightPrev;
   idInteraction          *entityNext;            // for entityDef chains
   idInteraction          *entityPrev;
   
public:
   idInteraction( void );
   
   // because these are generated and freed each game tic for active elements all
   // over the world, we use a custom pool allocater to avoid memory allocation overhead
   // and fragmentation
   static idInteraction    *AllocAndLink( idRenderEntityLocal *edef, idRenderLightLocal *ldef );
   
   // unlinks from the entity and light, frees all surfaceInteractions,
   // and puts it back on the free list
   void               UnlinkAndFree( void );
   
   // free the interaction surfaces
   void               FreeSurfaces( void );
   
   // makes the interaction empty for when the light and entity do not actually intersect
   // all empty interactions are linked at the end of the light's and entity's interaction list
   void               MakeEmpty( void );
   
   // returns true if the interaction is empty
   bool               IsEmpty( void ) const
   {
      return ( numSurfaces == 0 );
   }
   
   // returns true if the interaction is not yet completely created
   bool               IsDeferred( void ) const
   {
      return ( numSurfaces == -1 );
   }
   
   // returns true if the interaction has shadows
   bool               HasShadows( void ) const;
   
   // counts up the memory used by all the surfaceInteractions, which
   // will be used to determine when we need to start purging old interactions
   int                  MemoryUsed( void );
   
   // makes sure all necessary light surfaces and shadow surfaces are created, and
   // calls R_LinkLightSurf () for each one
   bool               AddActiveInteraction(bool interactionPhase1, idScreenRect *shadowScissor, idRenderModel **modelRef);
   
   // actually create the interaction
   void               CreateInteraction(const idRenderModel *model);

private:
   enum
   {
      FRUSTUM_UNINITIALIZED,
      FRUSTUM_INVALID,
      FRUSTUM_VALID,
      FRUSTUM_VALIDAREAS,
   }                  frustumState;
   idFrustum            frustum;            // frustum which contains the interaction
   areaNumRef_t          *frustumAreas;         // numbers of the areas the frustum touches
   
   int                  dynamicModelFrameCount;   // so we can tell if a callback model animated
   
private:
   
   // unlink from entity and light lists
   void               Unlink( void );
   
   // try to determine if the entire interaction, including shadows, is guaranteed
   // to be outside the view frustum
   bool               CullInteractionByViewFrustum( const idFrustum &viewFrustum );
   
   // determine the minimum scissor rect that will include the interaction shadows
   // projected to the bounds of the light
   idScreenRect         CalcInteractionScissorRectangle( const idFrustum &viewFrustum );
};


Maybe someone can fix the OpenMP parts for windows as the reported performance increase is well worth it :)
Productivity is a state of mind.
User avatar
revelator
 
Posts: 2536
Joined: Thu Jan 24, 2008 12:04 pm
Location: inside tha debugger

Re: Doom3 shadow optimization

Postby revelator » Sat Jun 14, 2014 9:04 pm

Heh speak of the devil i partly fixed it for windows, though atm the openmp optimization only affect interactions,
the patch for linux also put openmp optimizations on idlib's heapmanager and a few other places in the renderer.

so at the top of tr_light.cpp yank in these two ->

// omp stage locks for interactions
static omp_lock_t stage1Lock;
static omp_lock_t stage2Lock;

roll down to R_AddModelSurfaces

and at the bottom of that function make it look like this ->

Code: Select all
   // defer the interactions to here
   omp_init_lock(&stage1Lock);
   #pragma omp parallel for default(shared) schedule(dynamic)
   
   for (i = 0; i < nInteractions; i++)
   {
      while (!omp_test_lock(&stage1Lock))
      {
         common->Warning("Could not get lock for stage1 interactions\n");
         continue;
      }
      interactionPhase2[i] = interactions[i]->AddActiveInteraction(true, &shadowScissor[i], &interactionModelPtr[i]);

      if (interactionModelPtr[i])
      {
         createInteractionId[nCreateInteractions] = i;
         createInteractionModel[nCreateInteractions] = interactionModelPtr[i];
         nCreateInteractions++;
      }
      omp_unset_lock(&stage1Lock);
   }   
   omp_destroy_lock(&stage1Lock);

   // next interaction table
   omp_init_lock(&stage2Lock);
   #pragma omp parallel for shared(interactions,createInteractionId,createInteractionModel) schedule(dynamic)

   for (j = 0; j < nCreateInteractions; j++)
   {
      while (!omp_test_lock(&stage2Lock))
      {
         common->Warning("Could not get lock for the stage2 interaction creator\n");
         continue;
      }
      interactions[createInteractionId[j]]->CreateInteraction(createInteractionModel[j]);

      omp_unset_lock(&stage2Lock);
   }

   for (i = 0; i < nInteractions; i++)
   {
      while (!omp_test_lock(&stage2Lock))
      {
         common->Warning("Could not get lock for stage2 interactions\n");
         continue;
      }

      if (interactionPhase2[i])
      {
         interactions[i]->AddActiveInteraction(false, &shadowScissor[i], &interactionModelPtr[i]);
      }
      omp_unset_lock(&stage2Lock);
   }
   omp_destroy_lock(&stage2Lock);


add #include<omp.h> to precompiled.h somewhere after #include<stdio.h>

turn openmp support on in properties and recompile.

remember to include the vccomp<version>.dll with Doom3.

Btw msvc does not like braces after the #pragmas for openmp so avoid them you will not be able to compile it else.
Productivity is a state of mind.
User avatar
revelator
 
Posts: 2536
Joined: Thu Jan 24, 2008 12:04 pm
Location: inside tha debugger

Re: Doom3 shadow optimization

Postby nbohr1more » Sun Jun 15, 2014 1:31 am

Thanks for the info!

A couple of questions:

1) The deferred lighting portion, are you seeing a larger memory footprint?

2) Is this reliant on MH's VBO changes?

3) I presume this flattens light costs so that you can have more lights per scene. Does this also improve
scenes with large draw distance (eg. lower the scene traversal costs for lights)?

4) Would you mind stopping over to The Dark Mod forums and chatting with Obsttorte about this :) ?
nbohr1more
 
Posts: 54
Joined: Fri Dec 09, 2011 7:04 am

Re: Doom3 shadow optimization

Postby revelator » Sun Jun 15, 2014 4:45 am

Hey nbor :)

1: memory footprint rises a bit with this aye.
2: not reliant on MH's changes two different beasts ;)
3: it does lower it a bit only got doom3 itself to go from but fps improves a bit in scenes with tons of light sources.
4: sure though im no wizard with this code took me considerable time to even get it to work correctly without the OpenMP part :)

P.s the OpenMP part still needs work so keep that one off for now if you want to toy with it, does not crash but it hangs the engine on level loads.
Productivity is a state of mind.
User avatar
revelator
 
Posts: 2536
Joined: Thu Jan 24, 2008 12:04 pm
Location: inside tha debugger

Re: Doom3 shadow optimization

Postby revelator » Mon Jun 16, 2014 6:43 am

Hmm registered on the darkmod forums and am able to log in but im not allowed to post yet it seems :?:

I read the thread there so to avoid any confusion from the other devs its probably best to tell them that the openmp part was originally for linux and does not currently work correctly for windows.
The reason is that linux openmp uses pthreads and the parts needed for it to work on windows would have to be ported to a windows threading library instead.

The OpenMP from the patch is also used for Doom3's heap manager / as well as other parts of the renderer. The only part im using atm are the codechanges for defering the interactions but without the openmp part
the speedgain will probably not be worth it.

Does not hurt though and in case we get the openmp part working then most of the code would be ready for it :)
Productivity is a state of mind.
User avatar
revelator
 
Posts: 2536
Joined: Thu Jan 24, 2008 12:04 pm
Location: inside tha debugger

Re: Doom3 shadow optimization

Postby revelator » Mon Jun 16, 2014 2:15 pm

ok first batch of the openmp patch ready for consumption.

This time around we do the heap manager so here goes.

in sys_public.h

yank these in just above the cpuid_t enum.

Code: Select all
// if false means we could not lock it
static bool locked = false;

// get number of threads running
int sem_get_threads( void )
{
   return omp_get_thread_num();
}

// initialize the lock.
void sem_init_lock( omp_lock_t *mutex )
{
   // if we get no threads running we are fubared.
   omp_init_lock( mutex );
}

// will try and force a lock, hangs if it fails.
void sem_set_lock( omp_lock_t *mutex )
{
   omp_set_lock( mutex );
}

// will try and set a lock but will not hang if it cant.
void sem_wait_lock( omp_lock_t *mutex )
{
   // check if mutex is lockable.
   while ( !omp_test_lock( mutex ) )
   {
      // not lockable.
      common->Warning( "Could not aquire lock\n" );
      return;
   }
   locked = true;

   // lockable.
   while ( sem_get_threads() > 0 )
   {
      // dont want to spam the console.
      common->Printf( "Locked %i threads\n", sem_get_threads() );
      break;
   }
}

// unset the lock set by sem_wait_lock.
void sem_unset_lock( omp_lock_t *mutex )
{
   // unsetting a non locked thread is not standard.
   if ( locked )
   {
      omp_unset_lock( mutex );
      return;
   }
}

// warning put this in the wrong place and you crash the engine.
// function is bugged do not use !!!.
void sem_close_lock( omp_lock_t *mutex )
{
   if ( !locked )
   {
      omp_destroy_lock( mutex );
      return;
   }
}


These are just wrappers around the openmp functions to keep things simple.

go into heap.cpp and find

static memoryStats_t mem_frame_frees;

and below it add static omp_lock_t alloc_mutex;

find void Mem_Init (void)

and add this sem_init_lock(&alloc_mutex); before the other calls so it looks like this.

Code: Select all
void Mem_Init (void)
{
   sem_init_lock(&alloc_mutex);
   mem_heap = new idHeap;
   Mem_ClearFrameStats ();
}


and the same in

Code: Select all
void Mem_Shutdown (void)
{
   sem_init_lock(&alloc_mutex);
   idHeap *m = mem_heap;
   mem_heap = NULL;
   delete m;
}


do the same for the debug versions lower down.

change Mem_Alloc to this

Code: Select all
void *Mem_Alloc (const int size)
{
   if (!size)
   {
      return NULL;
   }
   void *mem = NULL;

   sem_wait_lock(&alloc_mutex);

   if (!mem_heap)
   {
#ifdef CRASH_ON_STATIC_ALLOCATION
      * ((int *) 0x0) = 1;
#endif
      return malloc(size);
   }
   mem = mem_heap->Allocate(size);

   Mem_UpdateAllocStats(mem_heap->Msize(mem));

   sem_unset_lock(&alloc_mutex);

   return mem;
}


and Mem_Free to this

Code: Select all
void Mem_Free (void *ptr)
{
   if (!ptr)
   {
      return;
   }
   sem_wait_lock(&alloc_mutex);
   
   if (!mem_heap)
   {
#ifdef CRASH_ON_STATIC_ALLOCATION
      * ((int *) 0x0) = 1;
#endif
      free(ptr);
      return;
   }
   Mem_UpdateFreeStats(mem_heap->Msize(ptr));

   mem_heap->Free(ptr);

   sem_unset_lock(&alloc_mutex);
}


Mem_Alloc16 to this

Code: Select all
void *Mem_Alloc16 (const int size)
{
   if (!size)
   {
      return NULL;
   }
   void *mem = NULL;
   
   sem_wait_lock(&alloc_mutex);

   if (!mem_heap)
   {
#ifdef CRASH_ON_STATIC_ALLOCATION
      * ((int *) 0x0) = 1;
#endif
      return malloc(size);
   }
   mem = mem_heap->Allocate16(size);

   // make sure the memory is 16 byte aligned
   assert((((int)mem) & 15) == 0);
   
   sem_unset_lock(&alloc_mutex);

   return mem;
}


and Mem_Free16

Code: Select all
void Mem_Free16 (void *ptr)
{
   if (!ptr)
   {
      return;
   }
   sem_wait_lock(&alloc_mutex);

   if (!mem_heap)
   {
#ifdef CRASH_ON_STATIC_ALLOCATION
      * ((int *) 0x0) = 1;
#endif
      free(ptr);
      return;
   }

   // make sure the memory is 16 byte aligned
   assert((((int)ptr) & 15) == 0);

   mem_heap->Free16(ptr);

   sem_unset_lock(&alloc_mutex);
}


Mem_AllocDefragBlock to this

Code: Select all
void Mem_AllocDefragBlock (void)
{
   sem_wait_lock(&alloc_mutex);
   mem_heap->AllocDefragBlock();
   sem_unset_lock(&alloc_mutex);
}


Mem_AllocDebugMemory as well

Code: Select all
void *Mem_AllocDebugMemory (const int size, const char *fileName, const int lineNumber, const bool align16)
{
   void         *p;
   debugMemory_t   *m;

   if (!size)
   {
      return NULL;
   }
   sem_wait_lock(&alloc_mutex);

   if (!mem_heap)
   {
#ifdef CRASH_ON_STATIC_ALLOCATION
      * ((int *) 0x0) = 1;
#endif
      // NOTE: set a breakpoint here to find memory allocations before mem_heap is initialized
      return malloc (size);
   }

   if (align16)
   {
      p = mem_heap->Allocate16 (size + sizeof (debugMemory_t));
   }
   else
   {
      p = mem_heap->Allocate (size + sizeof (debugMemory_t));
   }
   Mem_UpdateAllocStats (size);

   m = (debugMemory_t *) p;
   m->fileName = fileName;
   m->lineNumber = lineNumber;
   m->frameNumber = idLib::frameNumber;
   m->size = size;
   m->next = mem_debugMemory;
   m->prev = NULL;

   if (mem_debugMemory)
   {
      mem_debugMemory->prev = m;
   }
   mem_debugMemory = m;

   idLib::sys->GetCallStack (m->callStack, MAX_CALLSTACK_DEPTH);

   sem_unset_lock(&alloc_mutex);

   return (((byte *) p) + sizeof (debugMemory_t));
}


Mem_FreeDebugMemory

Code: Select all
void Mem_FreeDebugMemory (void *p, const char *fileName, const int lineNumber, const bool align16)
{
   debugMemory_t *m;

   if (!p)
   {
      return;
   }
   sem_wait_lock(&alloc_mutex);

   if (!mem_heap)
   {
#ifdef CRASH_ON_STATIC_ALLOCATION
      * ((int *) 0x0) = 1;
#endif
      // NOTE: set a breakpoint here to find memory being freed before mem_heap is initialized
      free (p);
      return;
   }
   m = (debugMemory_t *) (((byte *) p) - sizeof (debugMemory_t));

   if (m->size < 0)
   {
      idLib::common->FatalError ("memory freed twice, first from %s, now from %s", idLib::sys->GetCallStackStr (m->callStack, MAX_CALLSTACK_DEPTH), idLib::sys->GetCallStackCurStr (MAX_CALLSTACK_DEPTH));
   }
   Mem_UpdateFreeStats (m->size);

   if (m->next)
   {
      m->next->prev = m->prev;
   }

   if (m->prev)
   {
      m->prev->next = m->next;
   }
   else
   {
      mem_debugMemory = m->next;
   }
   m->fileName = fileName;
   m->lineNumber = lineNumber;
   m->frameNumber = idLib::frameNumber;
   m->size = -m->size;

   idLib::sys->GetCallStack (m->callStack, MAX_CALLSTACK_DEPTH);

   if (align16)
   {
      mem_heap->Free16 (m);
   }
   else
   {
      mem_heap->Free (m);
   }
   sem_unset_lock(&alloc_mutex);
}


go into Heap.h and find class idBlockAlloc

after this ->
int active;
put this
omp_lock_t alloc_mutex;


find class idDynamicBlockAlloc

and after ->
bool lockMemory;
put this in
omp_lock_t alloc_mutex;

now find idBlockAlloc<type, blockSize>::idBlockAlloc (void)

and after total = active = 0;
put this in sem_init_lock(&alloc_mutex);

find type *idBlockAlloc<type, blockSize>::Alloc (void)

and just below element_t *element;

put this in sem_wait_lock(&alloc_mutex);

and just before return &element->t;
put this in sem_unset_lock(&alloc_mutex);

now find void idBlockAlloc<type, blockSize>::Free (type *t)

and put this in before anything else

sem_wait_lock(&alloc_mutex);

and this after everything else

sem_unset_lock(&alloc_mutex);

now find void idBlockAlloc<type, blockSize>::Shutdown (void)

and put this in before anything else

sem_init_lock(&alloc_mutex);

find void idDynamicBlockAlloc<type, baseBlockSize, minBlockSize>::Init (void)

and put this in before anything else

sem_init_lock(&alloc_mutex);

just below you find this void idDynamicBlockAlloc<type, baseBlockSize, minBlockSize>::Shutdown (void)

and just after idDynamicBlock<type> *block;

put this in sem_init_lock(&alloc_mutex);

find void idDynamicBlockAlloc<type, baseBlockSize, minBlockSize>::SetFixedBlocks (int numBlocks)

and just after idDynamicBlock<type> *block;

put this in sem_wait_lock(&alloc_mutex);

and at the end of that function put this in

sem_unset_lock(&alloc_mutex);

find void idDynamicBlockAlloc<type, baseBlockSize, minBlockSize>::FreeEmptyBaseBlocks (void)

and just after idDynamicBlock<type> *block, *next;

put this in

sem_wait_lock(&alloc_mutex);

and at the end of that function put this in

sem_unset_lock(&alloc_mutex);

now find int idDynamicBlockAlloc<type, baseBlockSize, minBlockSize>::GetNumEmptyBaseBlocks (void) const

here we have to do a lil magic because the function is a const.
so just after idDynamicBlock<type> *block;

put this in

// ... need to cast the const from this function away ...
sem_wait_lock(const_cast<omp_lock_t *>(&alloc_mutex));

and just before return numEmptyBaseBlocks;

put this in

// ... need to cast the const from this function away ...
sem_unset_lock(const_cast<omp_lock_t *>(&alloc_mutex));

find type *idDynamicBlockAlloc<type, baseBlockSize, minBlockSize>::Alloc (const int num)

and just after this

Code: Select all
   if (num <= 0)
   {
      return NULL;
   }


put this in

sem_wait_lock(&alloc_mutex);

now there are two blocks checking if block = NULL

make em look like this

Code: Select all
   if (block == NULL)
   {
      sem_unset_lock(&alloc_mutex);
      return NULL;
   }


and just before return block->GetMemory ();

put this

sem_unset_lock(&alloc_mutex);


find type *idDynamicBlockAlloc<type, baseBlockSize, minBlockSize>::Resize (type *ptr, const int num)

and just after idDynamicBlock<type> *block = (idDynamicBlock<type> *) (((byte *) ptr) - (int) sizeof (idDynamicBlock<type>));

put this

sem_wait_lock(&alloc_mutex);

here we also got a block of code checking if block = NULL so make it look like this

Code: Select all
   if (block == NULL)
   {
      sem_unset_lock(&alloc_mutex);
      return NULL;
   }


and just before return block->GetMemory ();

put this in

sem_unset_lock(&alloc_mutex);

and last one (i swear :P) find void idDynamicBlockAlloc<type, baseBlockSize, minBlockSize>::Free (type *ptr)

and just after idDynamicBlock<type> *block = (idDynamicBlock<type> *) (((byte *)ptr) - (int) sizeof(idDynamicBlock<type>));

put this in.

sem_wait_lock(&alloc_mutex);

and at the end of that function put this in.

sem_unset_lock(&alloc_mutex);

Pheeew rather repeptitive :twisted:

ok now turn on OpenMP in properties for Doom3 and recompile it.

If you did it correctly it runs if not it hangs ;) speedwise you probably wont see much of an improvement only thing this does is locking the threads for openmp use later on,
but this was the largest part of it, now we just have to cross fingers that i can fix the bugger to work for the renderer to :mrgreen:

Edit: Test run yep the above only allocates 1 thread, to go parallel we need the OpenMP pragmas but atm they hang the engine on level load umpf.
Pretty scary though that all this crap is needed, there are only two OpenMP pragmas in use by it and both are in tr_light.cpp. My best guess is that its to make sure that we dont get runaway threads.
Productivity is a state of mind.
User avatar
revelator
 
Posts: 2536
Joined: Thu Jan 24, 2008 12:04 pm
Location: inside tha debugger

Re: Doom3 shadow optimization

Postby revelator » Mon Jun 16, 2014 7:43 pm

Ouch just learned that Oliver McFadden the author of the Dante port of Doom3 has passed away :(
sorry to hear that, i hope someone else picks up his project, and maybe backport some of the changes
to windows.
Productivity is a state of mind.
User avatar
revelator
 
Posts: 2536
Joined: Thu Jan 24, 2008 12:04 pm
Location: inside tha debugger

Re: Doom3 shadow optimization

Postby revelator » Mon Jun 16, 2014 7:48 pm

Heres the only two places that makes use of OpenMP's mutithreading.

Code: Select all
   // defer the interactions to here
//#pragma omp parallel for default(shared) schedule(dynamic) // hangs the engine at entering level.
   for (i = 0; i < nInteractions; i++)
   {
      interactionPhase2[i] = interactions[i]->AddActiveInteraction(true, &shadowScissor[i], &interactionModelPtr[i]);

      if (interactionModelPtr[i])
      {
         createInteractionId[nCreateInteractions] = i;
         createInteractionModel[nCreateInteractions] = interactionModelPtr[i];
         nCreateInteractions++;
      }
   }   

   // next interaction table
//#pragma omp parallel for shared(interactions,createInteractionId,createInteractionModel) schedule(dynamic) // hangs the engine at entering level.
   for (j = 0; j < nCreateInteractions; j++)
   {
      interactions[createInteractionId[j]]->CreateInteraction(createInteractionModel[j]);
   }

   for (i = 0; i < nInteractions; i++)
   {
      if (interactionPhase2[i])
      {
         interactions[i]->AddActiveInteraction(false, &shadowScissor[i], &interactionModelPtr[i]);
      }
   }


as commented they hang the engine at entering a level, im hearing rumors that you need to set common language runtime support on for openmp to work,
but doing so means turning off exceptions and linking to the dll version of msvcrt :S it will not compile else.
Productivity is a state of mind.
User avatar
revelator
 
Posts: 2536
Joined: Thu Jan 24, 2008 12:04 pm
Location: inside tha debugger

Re: Doom3 shadow optimization

Postby nbohr1more » Mon Jun 16, 2014 9:54 pm

Thanks for all your work!

What username did you try to register under?

(We have some anti-spam measures in place and we may need to white-list you :) )
nbohr1more
 
Posts: 54
Joined: Fri Dec 09, 2011 7:04 am

Re: Doom3 shadow optimization

Postby revelator » Mon Jun 16, 2014 11:03 pm

Revelator same as i use here ;)
Productivity is a state of mind.
User avatar
revelator
 
Posts: 2536
Joined: Thu Jan 24, 2008 12:04 pm
Location: inside tha debugger

Re: Doom3 shadow optimization

Postby revelator » Mon Jun 16, 2014 11:09 pm

Sadly im having no fun at all with Doom3 on my radeons, it runs and the things that are not misbehaving looks great, but the things that are misbehaving do so majorly :S

And it seems its not a problem with changes i made cause even the original Doom3.exe does it,
so (driver bug heh i guess they love hearing that at AMD) . Or something on my comp acting vile towards this particualr game :twisted:
Tbh im at my wits end, nothing helps in fact it makes it worse to try and fix it :S so i better stop now before i make it completly unplayable.
Productivity is a state of mind.
User avatar
revelator
 
Posts: 2536
Joined: Thu Jan 24, 2008 12:04 pm
Location: inside tha debugger

Re: Doom3 shadow optimization

Postby nbohr1more » Tue Jun 17, 2014 12:26 am

Sorry to hear that.

AMD used to have a developer relations email but now they've switched to a forum it seems:

http://devgurus.amd.com/welcome

if you gather any more inspiration to continue, you might wanna see if someone there can offer a workaround.

One thing you could also possibly try is passing your code through AMD CodeAnalyst.
nbohr1more
 
Posts: 54
Joined: Fri Dec 09, 2011 7:04 am

Re: Doom3 shadow optimization

Postby revelator » Tue Jun 17, 2014 8:54 am

Ran it through pretty much any static codeanalyzer i could get my hands on and besides lints 1.000.000 warnings (many of which i havent the foggiest how to fix) i pretty much fixed what i could.
PVS Studio a big help there. One should take note though that in some cases fixing whatever those analyzers may tell you will break things further down the chain so be carefull and recompile after every change,
and remember to test the changes out before declaring victory :) i broke several engines by just blindly following what those tools said was the correct thing to do hehe.

I think as a last effort i might fork dhewm3 and lob in the things from MHDoom that i know work :) that way GB can also have some fun with my work ;)
Productivity is a state of mind.
User avatar
revelator
 
Posts: 2536
Joined: Thu Jan 24, 2008 12:04 pm
Location: inside tha debugger

Re: Doom3 shadow optimization

Postby nbohr1more » Tue Jun 17, 2014 2:21 pm

Springheel approved your account at TDM forums. Please let me know if you have any trouble logging in.
nbohr1more
 
Posts: 54
Joined: Fri Dec 09, 2011 7:04 am

Re: Doom3 shadow optimization

Postby revelator » Tue Jun 17, 2014 2:38 pm

Thanks for the heads up :) not home atm but ill try when i get back.

Reported a bug to AMD btw.
There latest driver breaks idtech 4 based engines the latest beta will also not work correctly and will cause heavy Z fighting and triangle shaped dark areas in textures in games based on idtech4.
Hope they fix it in the next release ;).

drivers i know cause this are 14.4 and 14.6 beta. idtech 5 based games do not seem to suffer from this.
Guess ARB assembly shaders have numbered days :twisted:
Productivity is a state of mind.
User avatar
revelator
 
Posts: 2536
Joined: Thu Jan 24, 2008 12:04 pm
Location: inside tha debugger

Next

Return to General Programming

Who is online

Users browsing this forum: No registered users and 1 guest