Forum

SIMD/SSE Instructions

Discuss programming topics for any language, any source base. If it is programming related but doesn't fit in one of the below categories, it goes here.

Moderator: InsideQC Admins

Re: SIMD/SSE Instructions

Postby jitspoe » Thu Jun 06, 2013 6:55 pm

Upon further testing, the OpenGL lighting is, indeed, faster (especially when running in debug mode). I'm curious if the same holds true running it on my laptop with an integrated intel card. I'll have to test that later.

Came across a little article about why it's difficult to optimize with simd, and how compiler-friendly idioms are often the better way to go.
http://www.altdevblogaday.com/2011/12/2 ... ode-idiom/
jitspoe
 
Posts: 217
Joined: Mon Jan 17, 2005 5:27 am

Re: SIMD/SSE Instructions

Postby revelator » Fri Jun 07, 2013 10:00 am

Interresting read thanks for sharing :)
Productivity is a state of mind.
User avatar
revelator
 
Posts: 2528
Joined: Thu Jan 24, 2008 12:04 pm
Location: inside tha debugger

Re: SIMD/SSE Instructions

Postby jitspoe » Mon Jun 10, 2013 6:50 am

reckless wrote:Some code from MH i use with my own fork of Vanilla doom3.

Replaces memcpy with an asm optimized version and its way faster than anything i have tried so far :)


I don't think these are actually SIMD/SSE instructions.

In any case, I'm not getting good results with this. Perhaps it only works well for large and/or aligned memcpy's? I tried a mass replace with a #define in q_shared.h, and it was notably slower.

RB_MemCpy
219-220fps

memcpy
232-233fps

I notice you had the function declared as "static" (which I had to remove), so it seems like it would only be used in a specific file.
jitspoe
 
Posts: 217
Joined: Mon Jan 17, 2005 5:27 am

Re: SIMD/SSE Instructions

Postby revelator » Mon Jun 10, 2013 11:01 pm

Aye i use it locally in a glsl renderer where it beats standard memcpy by miles.
Its not SIMD/SSE just pure assembler Doom3 allready has an SSE version but its a lot slower than this one for the function im using it in.

Lots of data going through it so yeah might be that it shines in those situations :) atm im using it to copy glsl matrix calls and it nets me a 10 fps increase.
Productivity is a state of mind.
User avatar
revelator
 
Posts: 2528
Joined: Thu Jan 24, 2008 12:04 pm
Location: inside tha debugger

Re: SIMD/SSE Instructions

Postby qbism » Mon Jun 10, 2013 11:02 pm

This topic is over my head, but I've found it useful to compare actual compiler output to the hand-tuned function. For one thing, the optimizer may be doing a better or at least equal job. Statics make a difference when a loop can park some variables in registers rather than looking them up every pass.
User avatar
qbism
 
Posts: 1235
Joined: Thu Nov 04, 2004 5:51 am

Re: SIMD/SSE Instructions

Postby Spike » Tue Jun 11, 2013 12:13 am

if you're using memcpy at all then you've already lost. :P

RB_MemCpy depends on mmx, and will corrupt any/all x87 registers. You don't want to use it on small blocks of memory because that thing takes time to reset again afterwards.
memcpy is probably implemented as an intrinsic in most compilers (certainly gcc), at least if your size is a constant. for small copys, it'll just do the copy directly and bypass all function calls. it'll also be smart enough to notice when write over the dest in the following instructions and skip the extra reads, etc.
never underestimate the performance of the 'rep' prefix. :P
Spike
 
Posts: 2881
Joined: Fri Nov 05, 2004 3:12 am
Location: UK

Re: SIMD/SSE Instructions

Postby jitspoe » Tue Jun 11, 2013 4:03 am

I didn't think memcpy was used that often in quake2. I was surprised it made an overall difference in performance, even if the performance difference was significant between the two functions. It looks like it's used for a handful of misc little things. I was actually not using the intrinsic memcpy, because I had one function that could toggle back and forth between the old and new memcpy (so I didn't have to rebuild everything to test the change). Using memcpy directly would perform better (at worst, 1 conditional and 1 function call less overhead, at best, optimized intrinsics).
jitspoe
 
Posts: 217
Joined: Mon Jan 17, 2005 5:27 am

Re: SIMD/SSE Instructions

Postby revelator » Wed Jun 12, 2013 1:24 am

Came with the code ;) but ill try a direct copy operation instead. Best guess is that mh did it to simplify the code, i need to copy a lot of registers :lol:
Productivity is a state of mind.
User avatar
revelator
 
Posts: 2528
Joined: Thu Jan 24, 2008 12:04 pm
Location: inside tha debugger

Re: SIMD/SSE Instructions

Postby mh » Sat Aug 24, 2013 2:47 pm

reckless wrote:Came with the code ;) but ill try a direct copy operation instead. Best guess is that mh did it to simplify the code, i need to copy a lot of registers :lol:


I did it to avoid cache pollution when copying from system memory to a vertex buffer (where the copied values just need to go straight from source to destination without going into the CPU cache as well). Notice the movntq/etc instructions: http://www.rz.uni-karlsruhe.de/rz/docs/ ... /vc198.htm
We had the power, we had the space, we had a sense of time and place
We knew the words, we knew the score, we knew what we were fighting for
User avatar
mh
 
Posts: 2287
Joined: Sat Jan 12, 2008 1:38 am

Re: SIMD/SSE Instructions

Postby Spike » Sat Aug 24, 2013 4:08 pm

/me blinks...
mh?
HE'S ALIVE! :D
Guys! He's alive!
guys?
hey! guys!
where did you all go?...
just us then eh, mh?

yeah, I'm a little bored right now. blurgh. still, nice to see you around again.
Spike
 
Posts: 2881
Joined: Fri Nov 05, 2004 3:12 am
Location: UK

Re: SIMD/SSE Instructions

Postby revelator » Sat Aug 24, 2013 6:25 pm

hey m8 welcome back :)
Productivity is a state of mind.
User avatar
revelator
 
Posts: 2528
Joined: Thu Jan 24, 2008 12:04 pm
Location: inside tha debugger

Previous

Return to General Programming

Who is online

Users browsing this forum: No registered users and 1 guest