Another SSE/x87 rendering difference (R_LightPoint)

ericw · Post by **ericw** » Mon Jan 26, 2015 6:25 am

Just noticed this.. in my map jam3_ericw, there's a zombie at the start of the map that's embedded partly into a wall. It gets rendered as black in some engines (winquake.exe, fitzquake085.exe, the 32-bit windows quakespasm 0.90.0).
In others, it's lit at a medium brightness (64-bit windows QS 0.90.0).

You can see the zombie in the top middle picture here, in this case it's lit up:

Here's a screenshot from winquake:

(the map is available here, including source)

Even though it looks worse in winquake / fitzquake, that's the reference rendering IMHO, and it's probably my fault for sticking the zombie too far inside the wall.
The cause seems to be another x76 vs SSE difference, like this previous one that was causing lightmap corruption in some maps.

I was able to get winquake-equivalent lighting in code compiled with SSE with the following changes to RecursiveLightPoint (this is modifying the version in Quakespasm):

Add:

#define DoublePrecisionDotProduct(x,y) ((double)x[0]*y[0]+(double)x[1]*y[1]+(double)x[2]*y[2])

Change:

ds = (int) ((float) DotProduct (mid, surf->texinfo->vecs[0]) + surf->texinfo->vecs[0][3]);
dt = (int) ((float) DotProduct (mid, surf->texinfo->vecs[1]) + surf->texinfo->vecs[1][3]);

To:

ds = (int) ((double) DoublePrecisionDotProduct (mid, surf->texinfo->vecs[0]) + surf->texinfo->vecs[0][3]);
dt = (int) ((double) DoublePrecisionDotProduct (mid, surf->texinfo->vecs[1]) + surf->texinfo->vecs[1][3]);

Baker · Post by **Baker** » Tue Jan 27, 2015 1:04 am

What compiler did you use? I would guess Visual Studio 20xx, or are the cross-compile tools for MinGW on Linux workable for 64-bit Windows now?

I know you are putting this as a 32-bit vs. 64 bit difference. But I suspect something else could in play here.

You would think that the way two IEEE 754 floating point 32 numbers are multiplied in asm would be the same regardless of source code. If you see what I am suggesting. The instructions produced must be different.

Also: Isn't what the 32-bit WinQuake does the gold standard? The 32-bit WinQuake is Quake. What I am implying is that in your previous 32 bit vs. 64 bit, the 64-bit version was acting wrong. What I am suggesting is that the behavior of the 32-bit WinQuake can't be wrong and superceding that is a change. The changed behavior could affect tons of little hard to find things in a great many maps, putting things that weren't in shadows into shadows or putting things in shadows outside of them and changing lighting subtly away from what the author was testing against at the time of creation, etc.

Spike · Post by **Spike** » Tue Jan 27, 2015 2:59 am

if we're saying that only dosquake's precision is important, then remember that its probably using long-double(80bit) precision internally rather than regular doubles (more precise, but as its only and all intermedite values, this results in compiler-specific results as various compilers may consider various locals as intermediates to avoid writing them to memory only to reload them). SSE code can only use 32+64bit.

trying to exactly match the precision of an x87 program is basically a fool's errand if it leaves the control word of the x87 at its default setting.

Baker · Post by **Baker** » Tue Jan 27, 2015 4:21 am

Spike wrote:if we're saying that only dosquake's precision is important, then remember that its probably using long-double(80bit) precision internally rather than regular doubles (more precise, but as its only and all intermedite values, this results in compiler-specific results as various compilers may consider various locals as intermediates to avoid writing them to memory only to reload them). SSE code can only use 32+64bit.

trying to exactly match the precision of an x87 program is basically a fool's errand if it leaves the control word of the x87 at its default setting.

Unwritten theme #2 of what I was saying: fiddling with the floating point to get a certain result in a single case ... I'm not sure that kind of "specific result in a very specific case" fine tuning is helpful. Especially if it fine-tunes the expected result away from what every other engine on Win32 had as a result.

I didn't say anything about DOS Quake.

ericw · Post by **ericw** » Tue Jan 27, 2015 7:20 am

Baker wrote: Also: Isn't what the 32-bit WinQuake does the gold standard? The 32-bit WinQuake is Quake. What I am implying is that in your previous 32 bit vs. 64 bit, the 64-bit version was acting wrong. What I am suggesting is that the behavior of the 32-bit WinQuake can't be wrong and superceding that is a change. The changed behavior could affect tons of little hard to find things in a great many maps, putting things that weren't in shadows into shadows or putting things in shadows outside of them and changing lighting subtly away from what the author was testing against at the time of creation, etc.

Yup, I agree with all of that

. I didn't mean to argue for superceding WinQuake's behaviour, but rather, the code snippet I posted will (even if compiled with SSE) produce the same results as stock 1997 WinQuake, at least on this test case.

Baker wrote:What compiler did you use? I would guess Visual Studio 20xx, or are the cross-compile tools for MinGW on Linux workable for 64-bit Windows now?

Yeah, VS2013, and also Clang on OS X.

Baker wrote: I know you are putting this as a 32-bit vs. 64 bit difference. But I suspect something else could in play here.

You would think that the way two IEEE 754 floating point 32 numbers are multiplied in asm would be the same regardless of source code. If you see what I am suggesting. The instructions produced must be different.

Well not exactly 32-bit vs 64-bit, but I'm pretty sure it's x87 vs SSE.
I admit I didn't spend much time investigating this case, but I did on the lightmapping issue with mfx's map. Found the link where I wrote up more of an explanation of that one, maybe it'd be worth reposting here with some more details filled in: http://sourceforge.net/p/quakespasm/patches/15/

Both cases involve dot products like:

Code: Select all

float a,b,c,d,e,f,g;
float result = (a*b + c*d + e*f);

This happens a fair bit in quake. With the original executables (x87 floating point) the expression (a*b + c*d + e*f) is computed at 80-bit precision, and only then is the result rounded down to 32-bits.

AFAIK, c99 specifies that the compiler can compile the above like this:

Code: Select all

float a,b,c,d,e,f,g;
float temp1 = (a*b);
float temp2 = (c*d);
float temp3 = (e*f);
float result = temp1 + temp2 + temp3;

And I think this is what you get with SSE2 - since the rounding errors add up, the result will be accurate to less than 32 bits.
When you add the casts to double, it forces the compiler not to round the intermediate values down to 32-bits, but keep them at 64 bits, so that's why that helps match the results obtained from the original quake executables.
There's also some info on wikipedia about this: http://en.wikipedia.org/wiki/SSE2#Diffe ... U_and_SSE2

InsideQC Forums

Another SSE/x87 rendering difference (R_LightPoint)

Another SSE/x87 rendering difference (R_LightPoint)

Re: Another SSE/x87 rendering difference (R_LightPoint)

Re: Another SSE/x87 rendering difference (R_LightPoint)

Re: Another SSE/x87 rendering difference (R_LightPoint)

Re: Another SSE/x87 rendering difference (R_LightPoint)