SIMD/SSE Instructions

Discuss programming topics for any language, any source base. If it is programming related but doesn't fit in one of the below categories, it goes here.
jitspoe
Posts: 217
Joined: Mon Jan 17, 2005 5:27 am

Re: SIMD/SSE Instructions

Post by jitspoe »

Upon further testing, the OpenGL lighting is, indeed, faster (especially when running in debug mode). I'm curious if the same holds true running it on my laptop with an integrated intel card. I'll have to test that later.

Came across a little article about why it's difficult to optimize with simd, and how compiler-friendly idioms are often the better way to go.
http://www.altdevblogaday.com/2011/12/2 ... ode-idiom/
revelator
Posts: 2621
Joined: Thu Jan 24, 2008 12:04 pm
Location: inside tha debugger

Re: SIMD/SSE Instructions

Post by revelator »

Interresting read thanks for sharing :)
Productivity is a state of mind.
jitspoe
Posts: 217
Joined: Mon Jan 17, 2005 5:27 am

Re: SIMD/SSE Instructions

Post by jitspoe »

reckless wrote:Some code from MH i use with my own fork of Vanilla doom3.

Replaces memcpy with an asm optimized version and its way faster than anything i have tried so far :)
I don't think these are actually SIMD/SSE instructions.

In any case, I'm not getting good results with this. Perhaps it only works well for large and/or aligned memcpy's? I tried a mass replace with a #define in q_shared.h, and it was notably slower.

RB_MemCpy
219-220fps

memcpy
232-233fps

I notice you had the function declared as "static" (which I had to remove), so it seems like it would only be used in a specific file.
revelator
Posts: 2621
Joined: Thu Jan 24, 2008 12:04 pm
Location: inside tha debugger

Re: SIMD/SSE Instructions

Post by revelator »

Aye i use it locally in a glsl renderer where it beats standard memcpy by miles.
Its not SIMD/SSE just pure assembler Doom3 allready has an SSE version but its a lot slower than this one for the function im using it in.

Lots of data going through it so yeah might be that it shines in those situations :) atm im using it to copy glsl matrix calls and it nets me a 10 fps increase.
Productivity is a state of mind.
qbism
Posts: 1236
Joined: Thu Nov 04, 2004 5:51 am
Contact:

Re: SIMD/SSE Instructions

Post by qbism »

This topic is over my head, but I've found it useful to compare actual compiler output to the hand-tuned function. For one thing, the optimizer may be doing a better or at least equal job. Statics make a difference when a loop can park some variables in registers rather than looking them up every pass.
Spike
Posts: 2914
Joined: Fri Nov 05, 2004 3:12 am
Location: UK
Contact:

Re: SIMD/SSE Instructions

Post by Spike »

if you're using memcpy at all then you've already lost. :P

RB_MemCpy depends on mmx, and will corrupt any/all x87 registers. You don't want to use it on small blocks of memory because that thing takes time to reset again afterwards.
memcpy is probably implemented as an intrinsic in most compilers (certainly gcc), at least if your size is a constant. for small copys, it'll just do the copy directly and bypass all function calls. it'll also be smart enough to notice when write over the dest in the following instructions and skip the extra reads, etc.
never underestimate the performance of the 'rep' prefix. :P
jitspoe
Posts: 217
Joined: Mon Jan 17, 2005 5:27 am

Re: SIMD/SSE Instructions

Post by jitspoe »

I didn't think memcpy was used that often in quake2. I was surprised it made an overall difference in performance, even if the performance difference was significant between the two functions. It looks like it's used for a handful of misc little things. I was actually not using the intrinsic memcpy, because I had one function that could toggle back and forth between the old and new memcpy (so I didn't have to rebuild everything to test the change). Using memcpy directly would perform better (at worst, 1 conditional and 1 function call less overhead, at best, optimized intrinsics).
revelator
Posts: 2621
Joined: Thu Jan 24, 2008 12:04 pm
Location: inside tha debugger

Re: SIMD/SSE Instructions

Post by revelator »

Came with the code ;) but ill try a direct copy operation instead. Best guess is that mh did it to simplify the code, i need to copy a lot of registers :lol:
Productivity is a state of mind.
mh
Posts: 2292
Joined: Sat Jan 12, 2008 1:38 am

Re: SIMD/SSE Instructions

Post by mh »

reckless wrote:Came with the code ;) but ill try a direct copy operation instead. Best guess is that mh did it to simplify the code, i need to copy a lot of registers :lol:
I did it to avoid cache pollution when copying from system memory to a vertex buffer (where the copied values just need to go straight from source to destination without going into the CPU cache as well). Notice the movntq/etc instructions: http://www.rz.uni-karlsruhe.de/rz/docs/ ... /vc198.htm
We had the power, we had the space, we had a sense of time and place
We knew the words, we knew the score, we knew what we were fighting for
Spike
Posts: 2914
Joined: Fri Nov 05, 2004 3:12 am
Location: UK
Contact:

Re: SIMD/SSE Instructions

Post by Spike »

/me blinks...
mh?
HE'S ALIVE! :D
Guys! He's alive!
guys?
hey! guys!
where did you all go?...
just us then eh, mh?

yeah, I'm a little bored right now. blurgh. still, nice to see you around again.
revelator
Posts: 2621
Joined: Thu Jan 24, 2008 12:04 pm
Location: inside tha debugger

Re: SIMD/SSE Instructions

Post by revelator »

hey m8 welcome back :)
Productivity is a state of mind.
Post Reply