Properly scaled underwater screen turbulence

Post tutorials on how to do certain tasks within game or engine code here.
Post Reply
mankrip
Posts: 924
Joined: Fri Jul 04, 2008 3:02 am

Properly scaled underwater screen turbulence

Post by mankrip »

Not a full tutorial, but I want to release this code now as work on Makaqu became a low priority.

You're going to have to replace R_InitTurb and D_WarpScreen with the functions below. There's also a R_InitSin function and some globals to add.

It requires a high-res underwater framebuffer, which I've introduced a long time ago in Makaqu and is also present in other custom engines, but I'm not going to detail how to implement that here. Plus, it requires some changes I've introduced in Makaqu 1.6, such as screen aspect customization, but this requirement can be easily bypassed. In the future, ideal_height and ideal_width should also be part of the video mode description and be calculated by the video mode switching code instead of being calculated by R_InitTurb. And in the future, the vid structure should also indicate whether the current mode is fullscreen or windowed, so we don't need to read the vid_mode cvar.

By the way, the value of SIN_BUFFER_SIZE doesn't need to be bigger than CYCLE*2, not even in the original code. So, by setting it to the correct value, we eliminate some RAM wastage.

Also, there's both an unrolled loop and a rolled loop versions in the code. To switch to the rolled loop version, just change the zero in the #if 0 statements to 1.

Here's the code, from my d_turb.c file. In the vanilla source it should be located at d_scan.c:

Code: Select all

// mankrip - collecting all the turbulence code in one place only...
#define UNROLL_SPAN_SHIFT	5
#define UNROLL_SPAN_MAX	(1 << UNROLL_SPAN_SHIFT) // 32
#define SIN_BUFFER_SIZE (CYCLE * 2)
int		sintable[SIN_BUFFER_SIZE];
void R_InitSin (void)
{
	int
		x
		;
	// run this only once, at engine startup
	for (x = 0 ; x < SIN_BUFFER_SIZE ; x++)
		sintable[x] = (int) (AMP + sin ( (double)x * 3.14159 * 2.0 / CYCLE) * AMP);
}
// mankrip - hi-res waterwarp - begin
int
	* intsintable_x = NULL
,	* intsintable_y = NULL
,	* warpcolumn = NULL
,	* warprow = NULL
	;
byte
	* turbdest = NULL
,	* turbsrc = NULL
	;
float
	uscale = 1.0f
,	vscale = 1.0f
	;
// mankrip - hi-res waterwarp - end

void R_InitTurb (void)
{
	// mankrip - hi-res waterwarp - begin
	extern cvar_t vid_mode;
	float
		ideal_width
	,	ustep
	,	uoffset // horizontal offset for the phase
	,	uamp // horizontal warping amplitude
	,	ustretch
	,	u // source
	,	ideal_height // pre-sbar height
	,	vstep
	,	voffset // vertical offset for the phase
	,	vamp // vertical warping amplitude
	,	vstretch
	,	v // source
		;
	int
		x // destination
	,	y // destination
	,	warpcolmem
	,	warprowmem
		;
	// widescreen
	ideal_height = ( (float)r_refdef.vrect.width / (float)screen_padded_width) * (float)screen_padded_height;
	if (scr_pixel_squareaspect.value)
	{
		if (vid_stretched.value)
		{
			if (vid_mode.value > 2.0f) // fullscreen
				ideal_width = ideal_height * (320.0f / 240.0f * ( ( (float)vid.width / (float)vid.height) / (scr_native_w.value / scr_native_h.value)) );
			else // windowed
				ideal_width = ideal_height * (320.0f / 240.0f * ( (vid_desktop_w.value / vid_desktop_h.value) / (scr_native_w.value / scr_native_h.value)) );
		}
		else
			ideal_width = ideal_height * (320.0f / 240.0f); // when non-stretched, it doesn't matter whether it's windowed or fullscreen
	}
	else
	{
			if (vid_mode.value > 2.0f) // fullscreen
				ideal_width = ideal_height * (320.0f / 240.0f * ( ( (float)vid.width / (float)vid.height) / (scr_aspect_x.value / scr_aspect_y.value)) );
			else // windowed
				ideal_width = ideal_height * (320.0f / 240.0f * ( (vid_desktop_w.value / vid_desktop_h.value) / (scr_aspect_x.value / scr_aspect_y.value)) );
	}

	uscale = ideal_width  / 320.0f;
	vscale = ideal_height / 240.0f;
	ustep = 1.0f / uscale;
	vstep = 1.0f / vscale;
	uoffset = (0.5f * ( (float)r_refdef.vrect.width  - ideal_width )) / uscale;
	voffset = (0.5f * ( (float)r_refdef.vrect.height - ideal_height)) / vscale;
	uamp = AMP2 * uscale;
	vamp = AMP2 * vscale;
	ustretch = ( (float)r_refdef.vrect.width  - uamp * 2.0f) / (float)r_refdef.vrect.width ; // screen compression - use ideal_width instead?
	vstretch = ( (float)r_refdef.vrect.height - vamp * 2.0f) / (float)r_refdef.vrect.height; // screen compression - use ideal_height instead?

	if (warprow)// || warpcolumn || intsintable_x || intsintable_y)
	{
		free (warprow);
		free (warpcolumn);
		free (intsintable_x);
		free (intsintable_y);
	}
	warprowmem = r_refdef.vrect.height + (int) (vamp * 2.0f + CYCLE * vscale);
	intsintable_y = malloc (sizeof (int) * warprowmem);
	warprow       = malloc (sizeof (int) * warprowmem);
	for (v = 0.0f, y = 0 ; y < warprowmem ; v += vstep, y++)
	{
			// horizontal offset for waves
			// horizontal amplitude, vertical frequency
			// changes every line, remains the same every column
			intsintable_y[y] = uamp + sin ( (v - voffset) * 3.14159 * 2.0 / CYCLE) * uamp;
		//	warpcolumn[x] = (int) ( (float)x * ustretch);

			// vertical offset for waves
			// remains the same every line, changes every column
		//	intsintable_x[x] = vamp + sin ( (u - uoffset) * 3.14159 * 2.0 / CYCLE) * vamp;
			warprow[y]    = (int) ( (float)y * vstretch) * screenwidth;
	}
	warpcolmem = r_refdef.vrect.width + (int) (uamp * 2.0f + CYCLE * uscale);
	intsintable_x = malloc (sizeof (int) * warpcolmem);
	warpcolumn    = malloc (sizeof (int) * warpcolmem);
	for (u = 0.0f, x = 0 ; x < warpcolmem ; u += ustep, x++)
	{
			// horizontal offset for waves
			// horizontal amplitude, vertical frequency
			// changes every line, remains the same every column
		//	intsintable_y[y] = uamp + sin ( (v - voffset) * 3.14159 * 2.0 / CYCLE) * uamp;
			warpcolumn[x] = (int) ( (float)x * ustretch);

			// vertical offset for waves
			// remains the same every line, changes every column
			intsintable_x[x] = vamp + sin ( (u - uoffset) * 3.14159 * 2.0 / CYCLE) * vamp;
		//	warprow[y]    = (int) ( (float)y * vstretch) * screenwidth;
	}

	turbsrc = d_viewbuffer + r_refdef.vrect.y * screenwidth + r_refdef.vrect.x;
	turbdest = vid.buffer + scr_vrect.y * vid.rowbytes + scr_vrect.x;
	// mankrip - hi-res waterwarp - end
}

/*
=============
D_WarpScreen

// this performs a slight compression of the screen at the same time as
// the sine warp, to keep the edges from wrapping
=============
*/
void D_WarpScreen (void)
{
	// mankrip - hi-res waterwarp - begin
	byte
		* dest
	,	* tempdest
	,	* src
		;
	int
	#if 0
		x // destination
	#else
		count
	,	spancount
	#endif
	,	y // destination
	,	* turb_x
	,	* turb_x_temp
	,	* turb_y
	,	* row
	,	* col
		;
	float
		timeoffset = (float) ( (int) (cl.time * SPEED) & (CYCLE - 1)) // turbulence phase offset
		;
	R_InitTurb (); // calling this here because vid.recalc_refdef doesn't seem to always be set properly

	turb_x = intsintable_x + (int) (timeoffset * uscale);
	turb_y = intsintable_y + (int) (timeoffset * vscale);

	src = turbsrc;
	dest = turbdest;

	for (y = 0 ; y < r_refdef.vrect.height ; y++, dest += vid.rowbytes)
	{
		tempdest = dest;
		row = warprow + y;
		col = warpcolumn + turb_y[y];
		#if 0
		for (x = 0 ; x < r_refdef.vrect.width; x++)
			tempdest[x] = src[row[turb_x[x]] + col[x]];
		#else
		turb_x_temp = turb_x;
		count	  = r_refdef.vrect.width >> UNROLL_SPAN_SHIFT; // divided by 32
		spancount = r_refdef.vrect.width %  UNROLL_SPAN_MAX; // remainder of the above division (min zero, max 31)

		while (count--)
		{
			tempdest[31] = src[row[turb_x_temp[31]] + col[31]];
			tempdest[30] = src[row[turb_x_temp[30]] + col[30]];
			tempdest[29] = src[row[turb_x_temp[29]] + col[29]];
			tempdest[28] = src[row[turb_x_temp[28]] + col[28]];
			tempdest[27] = src[row[turb_x_temp[27]] + col[27]];
			tempdest[26] = src[row[turb_x_temp[26]] + col[26]];
			tempdest[25] = src[row[turb_x_temp[25]] + col[25]];
			tempdest[24] = src[row[turb_x_temp[24]] + col[24]];
			tempdest[23] = src[row[turb_x_temp[23]] + col[23]];
			tempdest[22] = src[row[turb_x_temp[22]] + col[22]];
			tempdest[21] = src[row[turb_x_temp[21]] + col[21]];
			tempdest[20] = src[row[turb_x_temp[20]] + col[20]];
			tempdest[19] = src[row[turb_x_temp[19]] + col[19]];
			tempdest[18] = src[row[turb_x_temp[18]] + col[18]];
			tempdest[17] = src[row[turb_x_temp[17]] + col[17]];
			tempdest[16] = src[row[turb_x_temp[16]] + col[16]];
			tempdest[15] = src[row[turb_x_temp[15]] + col[15]];
			tempdest[14] = src[row[turb_x_temp[14]] + col[14]];
			tempdest[13] = src[row[turb_x_temp[13]] + col[13]];
			tempdest[12] = src[row[turb_x_temp[12]] + col[12]];
			tempdest[11] = src[row[turb_x_temp[11]] + col[11]];
			tempdest[10] = src[row[turb_x_temp[10]] + col[10]];
			tempdest[ 9] = src[row[turb_x_temp[ 9]] + col[ 9]];
			tempdest[ 8] = src[row[turb_x_temp[ 8]] + col[ 8]];
			tempdest[ 7] = src[row[turb_x_temp[ 7]] + col[ 7]];
			tempdest[ 6] = src[row[turb_x_temp[ 6]] + col[ 6]];
			tempdest[ 5] = src[row[turb_x_temp[ 5]] + col[ 5]];
			tempdest[ 4] = src[row[turb_x_temp[ 4]] + col[ 4]];
			tempdest[ 3] = src[row[turb_x_temp[ 3]] + col[ 3]];
			tempdest[ 2] = src[row[turb_x_temp[ 2]] + col[ 2]];
			tempdest[ 1] = src[row[turb_x_temp[ 1]] + col[ 1]];
			tempdest[ 0] = src[row[turb_x_temp[ 0]] + col[ 0]];

			tempdest += UNROLL_SPAN_MAX;
			turb_x_temp += UNROLL_SPAN_MAX;
			col += UNROLL_SPAN_MAX;
		}
		if (spancount)
		{
			switch (spancount)
			{
				// from (UNROLL_SPAN_MAX - 1) to 1, because it will never be zero when getting here,
				// and values equal to UNROLL_SPAN_MAX are handled in the (count--) loop above
				case 31: tempdest[30] = src[row[turb_x_temp[30]] + col[30]];
				case 30: tempdest[29] = src[row[turb_x_temp[29]] + col[29]];
				case 29: tempdest[28] = src[row[turb_x_temp[28]] + col[28]];
				case 28: tempdest[27] = src[row[turb_x_temp[27]] + col[27]];
				case 27: tempdest[26] = src[row[turb_x_temp[26]] + col[26]];
				case 26: tempdest[25] = src[row[turb_x_temp[25]] + col[25]];
				case 25: tempdest[24] = src[row[turb_x_temp[24]] + col[24]];
				case 24: tempdest[23] = src[row[turb_x_temp[23]] + col[23]];
				case 23: tempdest[22] = src[row[turb_x_temp[22]] + col[22]];
				case 22: tempdest[21] = src[row[turb_x_temp[21]] + col[21]];
				case 21: tempdest[20] = src[row[turb_x_temp[20]] + col[20]];
				case 20: tempdest[19] = src[row[turb_x_temp[19]] + col[19]];
				case 19: tempdest[18] = src[row[turb_x_temp[18]] + col[18]];
				case 18: tempdest[17] = src[row[turb_x_temp[17]] + col[17]];
				case 17: tempdest[16] = src[row[turb_x_temp[16]] + col[16]];
				case 16: tempdest[15] = src[row[turb_x_temp[15]] + col[15]];
				case 15: tempdest[14] = src[row[turb_x_temp[14]] + col[14]];
				case 14: tempdest[13] = src[row[turb_x_temp[13]] + col[13]];
				case 13: tempdest[12] = src[row[turb_x_temp[12]] + col[12]];
				case 12: tempdest[11] = src[row[turb_x_temp[11]] + col[11]];
				case 11: tempdest[10] = src[row[turb_x_temp[10]] + col[10]];
				case 10: tempdest[ 9] = src[row[turb_x_temp[ 9]] + col[ 9]];
				case  9: tempdest[ 8] = src[row[turb_x_temp[ 8]] + col[ 8]];
				case  8: tempdest[ 7] = src[row[turb_x_temp[ 7]] + col[ 7]];
				case  7: tempdest[ 6] = src[row[turb_x_temp[ 6]] + col[ 6]];
				case  6: tempdest[ 5] = src[row[turb_x_temp[ 5]] + col[ 5]];
				case  5: tempdest[ 4] = src[row[turb_x_temp[ 4]] + col[ 4]];
				case  4: tempdest[ 3] = src[row[turb_x_temp[ 3]] + col[ 3]];
				case  3: tempdest[ 2] = src[row[turb_x_temp[ 2]] + col[ 2]];
				case  2: tempdest[ 1] = src[row[turb_x_temp[ 1]] + col[ 1]];
				case  1: tempdest[ 0] = src[row[turb_x_temp[ 0]] + col[ 0]];
				break;
			}
		}
		#endif
	}
	// mankrip - hi-res waterwarp - end
}
[edit] Fixed the case switch.
Last edited by mankrip on Thu Jun 27, 2013 4:22 am, edited 2 times in total.
Ph'nglui mglw'nafh mankrip Hell's end wgah'nagl fhtagn.
==-=-=-=-=-=-=-=-=-=-==
Dev blog / Twitter / YouTube
qbism
Posts: 1236
Joined: Thu Nov 04, 2004 5:51 am
Contact:

Re: Properly scaled underwater screen turbulence

Post by qbism »

multithreaded LOL

Code: Select all

pthread_t thread[NUMTHREADS]; //qb:  global so funcs can use without expense of creating new threads every time.

typedef struct warpslice_s  //qb: for multithreading
{
    int rowstart, rowend;
    byte		*src, *dest;
    int  *turb_x, *turb_y;
} warpslice_t;


void* WarpLoop (warpslice_t* ws)
{
    byte *src, *dest, *tempdest;
    int  *row, *col, *turb_x, *turb_x_temp, *turb_y;
    int rollcount, spancount, y;

    src = ws->src;
    dest = ws->dest;
    turb_x = ws->turb_x;
    turb_y = ws->turb_y;

    for (y = ws->rowstart ; y < ws->rowend ; y++, dest += vid.rowbytes)
    {
        tempdest = dest;
        row = warprow + y;
        col = warpcolumn + turb_y[y];
        turb_x_temp = turb_x;
        rollcount  = r_refdef.vrect.width >> UNROLL_SPAN_SHIFT; // divided by 32
        spancount = r_refdef.vrect.width %  UNROLL_SPAN_MAX; // remainder of the above division (min zero, max 32)

        while (rollcount--)
        {
            tempdest[31] = src[row[turb_x_temp[31]] + col[31]];
            tempdest[30] = src[row[turb_x_temp[30]] + col[30]];
            tempdest[29] = src[row[turb_x_temp[29]] + col[29]];
            tempdest[28] = src[row[turb_x_temp[28]] + col[28]];
            tempdest[27] = src[row[turb_x_temp[27]] + col[27]];
            tempdest[26] = src[row[turb_x_temp[26]] + col[26]];
            tempdest[25] = src[row[turb_x_temp[25]] + col[25]];
            tempdest[24] = src[row[turb_x_temp[24]] + col[24]];
            tempdest[23] = src[row[turb_x_temp[23]] + col[23]];
            tempdest[22] = src[row[turb_x_temp[22]] + col[22]];
            tempdest[21] = src[row[turb_x_temp[21]] + col[21]];
            tempdest[20] = src[row[turb_x_temp[20]] + col[20]];
            tempdest[19] = src[row[turb_x_temp[19]] + col[19]];
            tempdest[18] = src[row[turb_x_temp[18]] + col[18]];
            tempdest[17] = src[row[turb_x_temp[17]] + col[17]];
            tempdest[16] = src[row[turb_x_temp[16]] + col[16]];
            tempdest[15] = src[row[turb_x_temp[15]] + col[15]];
            tempdest[14] = src[row[turb_x_temp[14]] + col[14]];
            tempdest[13] = src[row[turb_x_temp[13]] + col[13]];
            tempdest[12] = src[row[turb_x_temp[12]] + col[12]];
            tempdest[11] = src[row[turb_x_temp[11]] + col[11]];
            tempdest[10] = src[row[turb_x_temp[10]] + col[10]];
            tempdest[ 9] = src[row[turb_x_temp[ 9]] + col[ 9]];
            tempdest[ 8] = src[row[turb_x_temp[ 8]] + col[ 8]];
            tempdest[ 7] = src[row[turb_x_temp[ 7]] + col[ 7]];
            tempdest[ 6] = src[row[turb_x_temp[ 6]] + col[ 6]];
            tempdest[ 5] = src[row[turb_x_temp[ 5]] + col[ 5]];
            tempdest[ 4] = src[row[turb_x_temp[ 4]] + col[ 4]];
            tempdest[ 3] = src[row[turb_x_temp[ 3]] + col[ 3]];
            tempdest[ 2] = src[row[turb_x_temp[ 2]] + col[ 2]];
            tempdest[ 1] = src[row[turb_x_temp[ 1]] + col[ 1]];
            tempdest[ 0] = src[row[turb_x_temp[ 0]] + col[ 0]];

            tempdest += UNROLL_SPAN_MAX;
            turb_x_temp += UNROLL_SPAN_MAX;
            col += UNROLL_SPAN_MAX;
        }
        if (spancount)
        {
            switch (spancount)
            {
                // from UNROLL_SPAN_MAX to 1
            case 31:
                tempdest[31] = src[row[turb_x_temp[31]] + col[31]];
            case 30:
                tempdest[30] = src[row[turb_x_temp[30]] + col[30]];
            case 29:
                tempdest[29] = src[row[turb_x_temp[29]] + col[29]];
            case 28:
                tempdest[28] = src[row[turb_x_temp[28]] + col[28]];
            case 27:
                tempdest[27] = src[row[turb_x_temp[27]] + col[27]];
            case 26:
                tempdest[26] = src[row[turb_x_temp[26]] + col[26]];
            case 25:
                tempdest[25] = src[row[turb_x_temp[25]] + col[25]];
            case 24:
                tempdest[24] = src[row[turb_x_temp[24]] + col[24]];
            case 23:
                tempdest[23] = src[row[turb_x_temp[23]] + col[23]];
            case 22:
                tempdest[22] = src[row[turb_x_temp[22]] + col[22]];
            case 21:
                tempdest[21] = src[row[turb_x_temp[21]] + col[21]];
            case 20:
                tempdest[20] = src[row[turb_x_temp[20]] + col[20]];
            case 19:
                tempdest[19] = src[row[turb_x_temp[19]] + col[19]];
            case 18:
                tempdest[18] = src[row[turb_x_temp[18]] + col[18]];
            case 17:
                tempdest[17] = src[row[turb_x_temp[17]] + col[17]];
            case 16:
                tempdest[16] = src[row[turb_x_temp[16]] + col[16]];
            case 15:
                tempdest[15] = src[row[turb_x_temp[15]] + col[15]];
            case 14:
                tempdest[14] = src[row[turb_x_temp[14]] + col[14]];
            case 13:
                tempdest[13] = src[row[turb_x_temp[13]] + col[13]];
            case 12:
                tempdest[12] = src[row[turb_x_temp[12]] + col[12]];
            case 11:
                tempdest[11] = src[row[turb_x_temp[11]] + col[11]];
            case 10:
                tempdest[10] = src[row[turb_x_temp[10]] + col[10]];
            case  9:
                tempdest[ 9] = src[row[turb_x_temp[ 9]] + col[ 9]];
            case  8:
                tempdest[ 8] = src[row[turb_x_temp[ 8]] + col[ 8]];
            case  7:
                tempdest[ 7] = src[row[turb_x_temp[ 7]] + col[ 7]];
            case  6:
                tempdest[ 6] = src[row[turb_x_temp[ 6]] + col[ 6]];
            case  5:
                tempdest[ 5] = src[row[turb_x_temp[ 5]] + col[ 5]];
            case  4:
                tempdest[ 4] = src[row[turb_x_temp[ 4]] + col[ 4]];
            case  3:
                tempdest[ 3] = src[row[turb_x_temp[ 3]] + col[ 3]];
            case  2:
                tempdest[ 2] = src[row[turb_x_temp[ 2]] + col[ 2]];
            case  1:
                tempdest[ 1] = src[row[turb_x_temp[ 1]] + col[ 1]];
            case  0:
                tempdest[ 0] = src[row[turb_x_temp[ 0]] + col[ 0]];
                break;
            }
        }
    }
}


void D_WarpScreen (void)
{
    // mankrip - hi-res waterwarp - begin
    byte
    * dest
    ,   * src
    ;

    int   * turb_x
    ,   * turb_y
    ;
    int i;
    float    timeoffset = (float) ( (int) (cl.time * SPEED) & (CYCLE - 1)); // turbulence phase offset
    
      //qb:  threads
        warpslice_t ws[NUMTHREADS];

    R_InitTurb (); // calling this here because vid.recalc_refdef doesn't seem to always be set properly

    turb_x = intsintable_x + (int) (timeoffset * uwarpscale);
    turb_y = intsintable_y + (int) (timeoffset * vwarpscale);

    src = vid.buffer + scr_vrect.y * vid.rowbytes + scr_vrect.x;
    dest= r_warpbuffer + scr_vrect.y * vid.rowbytes + scr_vrect.x;

    for (i=0; i<NUMTHREADS; i++)
    {
        ws[i].src = src;
        ws[i].dest = dest + vid.rowbytes * i * (r_refdef.vrect.height/NUMTHREADS);
        ws[i].turb_x = turb_x;
        ws[i].turb_y = turb_y;
        ws[i].rowstart= i*(r_refdef.vrect.height/NUMTHREADS);
        if (i+1 == NUMTHREADS)
            ws[i].rowend = r_refdef.vrect.height;
        else
            ws[i].rowend = (i+1)*(r_refdef.vrect.height/NUMTHREADS);
        pthread_create(&thread[i], NULL, WarpLoop, &ws[i]);
    }

  //wait for threads to finish
    for (i=0; i<NUMTHREADS; i++)
    {
        pthread_join(thread[i], NULL);
    }
    // mankrip - hi-res waterwarp - end

    //qb: copy buffer to video
    src = r_warpbuffer + scr_vrect.y * vid.width + scr_vrect.x;
    dest = vid.buffer + scr_vrect.y * vid.rowbytes + scr_vrect.x;
    for (i=0 ; i<scr_vrect.height ; i++, src += vid.width, dest += vid.rowbytes)
        memcpy(dest, src, scr_vrect.width);
}
mankrip
Posts: 924
Joined: Fri Jul 04, 2008 3:02 am

Re: Properly scaled underwater screen turbulence

Post by mankrip »

Nice!

How's the speed in comparison to the non-threaded code?
Ph'nglui mglw'nafh mankrip Hell's end wgah'nagl fhtagn.
==-=-=-=-=-=-=-=-=-=-==
Dev blog / Twitter / YouTube
qbism
Posts: 1236
Joined: Thu Nov 04, 2004 5:51 am
Contact:

Re: Properly scaled underwater screen turbulence

Post by qbism »

With 2, 3, or 4 threads it's about the same. More than 4 threads and performance gets worse, possibly due to more overhead.

I might be doing something wrong, the tasks might be too short to make threading efficient, or there's a bottleneck, or maybe the processor is limited. My cpu is a laptop icore3 with two 'real' cores and two 'virtual' cores. With the define set to 4 threads, one of the cores runs at 75%, two others at 25%, and the last core idles. Could it be a RAM or bus bottleneck?
mankrip
Posts: 924
Joined: Fri Jul 04, 2008 3:02 am

Re: Properly scaled underwater screen turbulence

Post by mankrip »

I have no idea, I have no experience with threading.

Also, I've looked more carefully into my code, and the case numbers were actually right. What was wrong was the case 32 line, which shouldn't be there because the spancount value will always be lower than that. I've fixed the code in the first post and added a further explanation.
Ph'nglui mglw'nafh mankrip Hell's end wgah'nagl fhtagn.
==-=-=-=-=-=-=-=-=-=-==
Dev blog / Twitter / YouTube
Post Reply