SIMD/SSE Instructions
Moderator: InsideQC Admins
26 posts
• Page 2 of 2 • 1, 2
Re: SIMD/SSE Instructions
Upon further testing, the OpenGL lighting is, indeed, faster (especially when running in debug mode). I'm curious if the same holds true running it on my laptop with an integrated intel card. I'll have to test that later.
Came across a little article about why it's difficult to optimize with simd, and how compiler-friendly idioms are often the better way to go.
http://www.altdevblogaday.com/2011/12/2 ... ode-idiom/
Came across a little article about why it's difficult to optimize with simd, and how compiler-friendly idioms are often the better way to go.
http://www.altdevblogaday.com/2011/12/2 ... ode-idiom/
- jitspoe
- Posts: 217
- Joined: Mon Jan 17, 2005 5:27 am
Re: SIMD/SSE Instructions
reckless wrote:Some code from MH i use with my own fork of Vanilla doom3.
Replaces memcpy with an asm optimized version and its way faster than anything i have tried so far
I don't think these are actually SIMD/SSE instructions.
In any case, I'm not getting good results with this. Perhaps it only works well for large and/or aligned memcpy's? I tried a mass replace with a #define in q_shared.h, and it was notably slower.
RB_MemCpy
219-220fps
memcpy
232-233fps
I notice you had the function declared as "static" (which I had to remove), so it seems like it would only be used in a specific file.
- jitspoe
- Posts: 217
- Joined: Mon Jan 17, 2005 5:27 am
Re: SIMD/SSE Instructions
Aye i use it locally in a glsl renderer where it beats standard memcpy by miles.
Its not SIMD/SSE just pure assembler Doom3 allready has an SSE version but its a lot slower than this one for the function im using it in.
Lots of data going through it so yeah might be that it shines in those situations
atm im using it to copy glsl matrix calls and it nets me a 10 fps increase.
Its not SIMD/SSE just pure assembler Doom3 allready has an SSE version but its a lot slower than this one for the function im using it in.
Lots of data going through it so yeah might be that it shines in those situations
Productivity is a state of mind.
-

revelator - Posts: 2567
- Joined: Thu Jan 24, 2008 12:04 pm
- Location: inside tha debugger
Re: SIMD/SSE Instructions
This topic is over my head, but I've found it useful to compare actual compiler output to the hand-tuned function. For one thing, the optimizer may be doing a better or at least equal job. Statics make a difference when a loop can park some variables in registers rather than looking them up every pass.
-
qbism - Posts: 1236
- Joined: Thu Nov 04, 2004 5:51 am
Re: SIMD/SSE Instructions
if you're using memcpy at all then you've already lost. 
RB_MemCpy depends on mmx, and will corrupt any/all x87 registers. You don't want to use it on small blocks of memory because that thing takes time to reset again afterwards.
memcpy is probably implemented as an intrinsic in most compilers (certainly gcc), at least if your size is a constant. for small copys, it'll just do the copy directly and bypass all function calls. it'll also be smart enough to notice when write over the dest in the following instructions and skip the extra reads, etc.
never underestimate the performance of the 'rep' prefix.
RB_MemCpy depends on mmx, and will corrupt any/all x87 registers. You don't want to use it on small blocks of memory because that thing takes time to reset again afterwards.
memcpy is probably implemented as an intrinsic in most compilers (certainly gcc), at least if your size is a constant. for small copys, it'll just do the copy directly and bypass all function calls. it'll also be smart enough to notice when write over the dest in the following instructions and skip the extra reads, etc.
never underestimate the performance of the 'rep' prefix.
- Spike
- Posts: 2892
- Joined: Fri Nov 05, 2004 3:12 am
- Location: UK
Re: SIMD/SSE Instructions
I didn't think memcpy was used that often in quake2. I was surprised it made an overall difference in performance, even if the performance difference was significant between the two functions. It looks like it's used for a handful of misc little things. I was actually not using the intrinsic memcpy, because I had one function that could toggle back and forth between the old and new memcpy (so I didn't have to rebuild everything to test the change). Using memcpy directly would perform better (at worst, 1 conditional and 1 function call less overhead, at best, optimized intrinsics).
- jitspoe
- Posts: 217
- Joined: Mon Jan 17, 2005 5:27 am
Re: SIMD/SSE Instructions
Came with the code
but ill try a direct copy operation instead. Best guess is that mh did it to simplify the code, i need to copy a lot of registers 
Productivity is a state of mind.
-

revelator - Posts: 2567
- Joined: Thu Jan 24, 2008 12:04 pm
- Location: inside tha debugger
Re: SIMD/SSE Instructions
reckless wrote:Came with the codebut ill try a direct copy operation instead. Best guess is that mh did it to simplify the code, i need to copy a lot of registers
I did it to avoid cache pollution when copying from system memory to a vertex buffer (where the copied values just need to go straight from source to destination without going into the CPU cache as well). Notice the movntq/etc instructions: http://www.rz.uni-karlsruhe.de/rz/docs/ ... /vc198.htm
We had the power, we had the space, we had a sense of time and place
We knew the words, we knew the score, we knew what we were fighting for
We knew the words, we knew the score, we knew what we were fighting for
-

mh - Posts: 2292
- Joined: Sat Jan 12, 2008 1:38 am
Re: SIMD/SSE Instructions
/me blinks...
mh?
HE'S ALIVE!
Guys! He's alive!
guys?
hey! guys!
where did you all go?...
just us then eh, mh?
yeah, I'm a little bored right now. blurgh. still, nice to see you around again.
mh?
HE'S ALIVE!
Guys! He's alive!
guys?
hey! guys!
where did you all go?...
just us then eh, mh?
yeah, I'm a little bored right now. blurgh. still, nice to see you around again.
- Spike
- Posts: 2892
- Joined: Fri Nov 05, 2004 3:12 am
- Location: UK
26 posts
• Page 2 of 2 • 1, 2
Who is online
Users browsing this forum: No registered users and 1 guest