Page 2 of 2

Re: SIMD/SSE Instructions

Posted: Thu Jun 06, 2013 6:55 pm
by jitspoe
Upon further testing, the OpenGL lighting is, indeed, faster (especially when running in debug mode). I'm curious if the same holds true running it on my laptop with an integrated intel card. I'll have to test that later.

Came across a little article about why it's difficult to optimize with simd, and how compiler-friendly idioms are often the better way to go.
http://www.altdevblogaday.com/2011/12/2 ... ode-idiom/

Re: SIMD/SSE Instructions

Posted: Fri Jun 07, 2013 10:00 am
by revelator
Interresting read thanks for sharing :)

Re: SIMD/SSE Instructions

Posted: Mon Jun 10, 2013 6:50 am
by jitspoe
reckless wrote:Some code from MH i use with my own fork of Vanilla doom3.

Replaces memcpy with an asm optimized version and its way faster than anything i have tried so far :)
I don't think these are actually SIMD/SSE instructions.

In any case, I'm not getting good results with this. Perhaps it only works well for large and/or aligned memcpy's? I tried a mass replace with a #define in q_shared.h, and it was notably slower.

RB_MemCpy
219-220fps

memcpy
232-233fps

I notice you had the function declared as "static" (which I had to remove), so it seems like it would only be used in a specific file.

Re: SIMD/SSE Instructions

Posted: Mon Jun 10, 2013 11:01 pm
by revelator
Aye i use it locally in a glsl renderer where it beats standard memcpy by miles.
Its not SIMD/SSE just pure assembler Doom3 allready has an SSE version but its a lot slower than this one for the function im using it in.

Lots of data going through it so yeah might be that it shines in those situations :) atm im using it to copy glsl matrix calls and it nets me a 10 fps increase.

Re: SIMD/SSE Instructions

Posted: Mon Jun 10, 2013 11:02 pm
by qbism
This topic is over my head, but I've found it useful to compare actual compiler output to the hand-tuned function. For one thing, the optimizer may be doing a better or at least equal job. Statics make a difference when a loop can park some variables in registers rather than looking them up every pass.

Re: SIMD/SSE Instructions

Posted: Tue Jun 11, 2013 12:13 am
by Spike
if you're using memcpy at all then you've already lost. :P

RB_MemCpy depends on mmx, and will corrupt any/all x87 registers. You don't want to use it on small blocks of memory because that thing takes time to reset again afterwards.
memcpy is probably implemented as an intrinsic in most compilers (certainly gcc), at least if your size is a constant. for small copys, it'll just do the copy directly and bypass all function calls. it'll also be smart enough to notice when write over the dest in the following instructions and skip the extra reads, etc.
never underestimate the performance of the 'rep' prefix. :P

Re: SIMD/SSE Instructions

Posted: Tue Jun 11, 2013 4:03 am
by jitspoe
I didn't think memcpy was used that often in quake2. I was surprised it made an overall difference in performance, even if the performance difference was significant between the two functions. It looks like it's used for a handful of misc little things. I was actually not using the intrinsic memcpy, because I had one function that could toggle back and forth between the old and new memcpy (so I didn't have to rebuild everything to test the change). Using memcpy directly would perform better (at worst, 1 conditional and 1 function call less overhead, at best, optimized intrinsics).

Re: SIMD/SSE Instructions

Posted: Wed Jun 12, 2013 1:24 am
by revelator
Came with the code ;) but ill try a direct copy operation instead. Best guess is that mh did it to simplify the code, i need to copy a lot of registers :lol:

Re: SIMD/SSE Instructions

Posted: Sat Aug 24, 2013 2:47 pm
by mh
reckless wrote:Came with the code ;) but ill try a direct copy operation instead. Best guess is that mh did it to simplify the code, i need to copy a lot of registers :lol:
I did it to avoid cache pollution when copying from system memory to a vertex buffer (where the copied values just need to go straight from source to destination without going into the CPU cache as well). Notice the movntq/etc instructions: http://www.rz.uni-karlsruhe.de/rz/docs/ ... /vc198.htm

Re: SIMD/SSE Instructions

Posted: Sat Aug 24, 2013 4:08 pm
by Spike
/me blinks...
mh?
HE'S ALIVE! :D
Guys! He's alive!
guys?
hey! guys!
where did you all go?...
just us then eh, mh?

yeah, I'm a little bored right now. blurgh. still, nice to see you around again.

Re: SIMD/SSE Instructions

Posted: Sat Aug 24, 2013 6:25 pm
by revelator
hey m8 welcome back :)