(Multi-) Threading

Baker · Post by **Baker** » Thu Apr 30, 2015 7:33 am

I've slightly mess with threading in C in the past, couldn't think of anything good to put it to use. Probably changing now ...

Ok, Objective-C multi-threads and hides everything away. C won't be kind like that.

Rules as far as I know:
1. Writing to a type is atomic.
2. In C, I probably can't atomically write an int or "string" without tons of overhead.
3. Windows uses it's own deal, on non-Windows posix is the deal. <pthread.h>
4. I need to create mutex's for locking and unlocking, avoid "race conditions" and anything that can cause a perma-lock.
5. Spike said grabbing stuff from FTE was a good starting point.

Spike · Post by **Spike** » Thu Apr 30, 2015 8:34 am

1. atomocity depends on the cpu and compiler.
x86 guarentees that even byte accesses are atomic (but not long longs, as these are emulated on 32bit chips like x86).
other cpus have only word (ie: 32bit) accesses, and need to use reads to avoid changing parts of the data.
atomic operations are available on most cpus (either full bus locks on x86, or write-if-still-equal on arm).

compilers can steamroller over everything by rearranging everything. Calling a opaque libc/system function like a mutex lock should ensure no memory accesses get rearranged over the call.
you can also declare types as volatile if you're doing some lockless unatomic syncs - just make SURE that your data type is a machine word, and that you're not running on some really cheap multi-processor system that has no cache coherency. or in other words, don't depend on volatile for anything whatsoever at the hardware level.

2. https://gcc.gnu.org/onlinedocs/gcc-4.1. ... ltins.html
for microsoft compilers, you need to depend upon the windows API instead - InterlockedAdd for instance. naturally, this sucks in portable code (but then so does using gcc-specifics)
C11 has threads.h and stdatomic.h but it'll be a long time before those will actually be usable... msvc users are still stuck with C89...

3. Yup, you'll need to abstract that with two sets of functions. See 5.
note that in windows, a 'mutex' is an inter-process thing. you'll likely want to use 'critical sections', which is what windows calls intra-process mutexes.

4. Yup, good luck with that when it comes to pretty much every single windows function potentially calling SendMessage. If you use mutexes inside your window handler, you are pretty much screwed.

5. FTE has had thread helpers for ages now, but didn't really do too much with them.
It now uses them for directsound mixing (yay, lower mixahead with rtlights), independant player physics (because hitting exactly 77fps with rtlights is hard), map relighting (who needs rtlights anyway), and of course a couple of worker threads to load textures (there are a LOT of textures when your rtlights use specular+bumps+luma+etc etc...) and stuff with.

The worker thread stuff allows me to queue work on a specific thread (either the main thread or the worker thread) and post things from one side to the other. The worker does its thing and posts the 'finished' product back to the main thread which then links it in accordingly (uploading it to gl in the case of textures). because they don't touch each other's stuff, there's no sync conditions between threads (although if your main thread expects a result too soon, you can still have race conditions, but you can test for these by bogging down the worker with some Sleep calls).
you can get away with the worker using cvar values, but don't use cvar strings as the pointer can become invalid at inopertune times.
Stuff with no return value like Con_Printf can include a little bit of code to check if its the main thread, and if not, post and return.
You'll need to very carefully read through the functions your thread calls to make sure its all thread safe, you WILL get hard-to-track bizzare errors if you fail to be truely thorough.

but yeah, the easy way to do it is to post function+data pointers and wait for a response (while not waiting, of course).
cunning use of 'condition variables' can allow your worker to wake up once something is posted to it, without any busy waits.
I guess in windows you could just use PostMessage and have your worker just block in a GetMessage message loop, and then post the result back to the main window (and use SendMessage when you want to sync). But yeah, windows-specific solutions suck.

Baker · Post by **Baker** » Thu Apr 30, 2015 9:09 am

Thanks for your knowledge on this topic.

Things I may end up using threads for:

1. The maps and demos menu. The code I have currently only updates 17 maps per frame. No problem, right?
---- Well, if Windows isn't expecting me to access those files, even a mere 17 quick fopens is slow as hell. With threads, maybe I can touch those files in another thread to wake up the Windows filesystem or whatever is going on with that.

2. I'm probably going to take a stab at server offering travail.zip or whatever game is play (especially for LAN) for installation. A client downloading via FTP or whatever I do can sit and wait, I don't care. But the server shouldn't be impaired if 3 clients are downloading a big zip file and 1 client isn't. I'm looking to avoid the problem that, say, Quakeworld tries address with download rate limiting by not rate limiting. Rate limiting with a 80 MB single player zip on a LAN isn't feasible and having the clients download from, say, Quaddicted is silly if the server has the zip. Server on a LAN can transfer to client WAY faster.

Spike · Post by **Spike** » Thu Apr 30, 2015 10:35 am

Baker wrote: 1. The maps and demos menu. The code I have currently only updates 17 maps per frame. No problem, right?
---- Well, if Windows isn't expecting me to access those files, even a mere 17 quick fopens is slow as hell. With threads, maybe I can touch those files in another thread to wake up the Windows filesystem or whatever is going on with that.

the advantage is that you go at the speed the system goes at.
throw a few threads at it and you won't be waiting for the disk quite so much.

2. I'm probably going to take a stab at server offering travail.zip or whatever game is play (especially for LAN) for installation. A client downloading via FTP or whatever I do can sit and wait, I don't care. But the server shouldn't be impaired if 3 clients are downloading a big zip file and 1 client isn't. I'm looking to avoid the problem that, say, Quakeworld tries address with download rate limiting by not rate limiting. Rate limiting with a 80 MB single player zip on a LAN isn't feasible and having the clients download from, say, Quaddicted is silly if the server has the zip. Server on a LAN can transfer to client WAY faster.

downloads are a good use of threads. you still want to throttle though - at the end of the day, game traffic should have a higher priority than download traffic, and you might not be the only server process on that machine.
ultimately, not throttling is simply not an option, because someone will set their cvars to aggressively flood packets from one server, and then get kicked off the net by the next server because it can actually cope with the requested bandwidth (ssd!). tcp has throttling built in due to its window sizes.

frag.machine · Post by **frag.machine** » Thu Apr 30, 2015 11:08 am

I suppose that BSP traversal for collision detection may benefit from the use of multithreading, too (don't ask me how though, it's way beyond my skills

). From previous benchmarks I concluded that traceline was the most CPU intensive task in singleplayer (not a big surprise here, but it is always good to have numbers supporting empiric knowledge).

revelator · Post by **revelator** » Fri May 01, 2015 9:34 am

could use intels threading building blocks which are opensource

works for both gcc and msvc.

mh · Post by mh » Fri May 01, 2015 10:42 am

If you're using traceline for collision detection you're probably doing something crazy in QC.

Quake by default uses an "areanodes" system instead, which is not the BSP tree but a separate hierarchical tree, which subdivides the world into areas and places entities into those areas, so that only entities in the same area need to be tested for collision with each other. It's still an O(n-squared) operation but for a significantly smaller value of n.

This fails with bigger maps because it uses a static number of areas (32) with a static area depth (4) so bigger maps just have bigger areas with more entities in them, and the cost of collision detection (value of n) goes up.

A more useful optimization is to just create more areas for bigger maps. Multithreading it is brute-forcing the problem in other words; an algorithmic optimization makes more sense in that case.

Baker · Post by **Baker** » Fri May 01, 2015 11:27 am

revelator wrote:could use intels threading building blocks which are opensource works for both gcc and msvc.

C++ library only for desktop computers. What could go wrong?

revelator · Post by **revelator** » Fri May 01, 2015 9:41 pm

C++ library only for desktop computers. What could go wrong?

Probably not much

but then again multithreading can be a bitch to get to work correctly, this should atleast help quite a bit.

Or use multithreading to run different parts of the engine in there own threads like Doom3 (2 threads) or BFG which takes it even further.
Not sure how much gain it would give an engine like quake but on something like darkplaces with all the bells and whistles it should surely be noticeable, atleast with rt lights etc.

Would be interesting to see the outcome

Baker · Post by **Baker** » Sat May 02, 2015 12:33 pm

revelator wrote:
C++ library only for desktop computers. What could go wrong?
Probably not much but then again multithreading can be a bitch to get to work correctly, this should atleast help quite a bit.

Your work with the Doom 3 engine is far more complex than what I work on

And DarkPlaces is more complex than what I work on.

Setting that aside, what I meant is that if I write code that is almost entirely portable to mobile platforms, why would I want to give that up? And why would I want to switch to C++ for threading?

I have no doubt that the library is outstanding for desktop operating systems and C++. But it's 2015 and there's more than desktops out there.

With a case of Red Bull, a few 5 Hour Energy and 4 days --- I could convert my current engine works to Android and iOS.

Spike · Post by **Spike** » Sat May 02, 2015 1:14 pm

Baker wrote:With a case of Red Bull, a few 5 Hour Energy and 4 days --- I could convert my current engine works to Android and iOS.

Go on then.
You will have to keep it running on windows+linux at the same time, of course. Good luck with that.

By the way, browsers don't support threads in the conventional sense, so if you're aiming at an emscripten port, you'll need to retain a threadless mode (at least at compile time).

Baker · Post by **Baker** » Sat May 02, 2015 1:38 pm

Spike wrote:
Baker wrote:With a case of Red Bull, a few 5 Hour Energy and 4 days --- I could convert my current engine works to Android and iOS.
browsers don't support threads in the conventional sense

???

Somewhere a divide by 0 occurred. I know not where. Maybe because I used the word convert or something?

Spike · Post by **Spike** » Sat May 02, 2015 1:54 pm

Spike wrote:so if you're aiming at an emscripten port

javascript. the ultimate in portability and suffering.
you can't claim you want to avoid desktop-only stuff because you want other ports only to completely rule out the use of emscripten to give you a version that'll run in browsers on any device (well, any device that supports webgl etc, anyway).
Hmm, you DO know what emscripten is, right? Just in case, its a C(llvm) -> javascript compiler...

meh, I guess that's what I get for trying to return to the whole multithreading topic thing.

Baker · Post by **Baker** » Sat May 02, 2015 1:58 pm

Spike wrote:
Spike wrote:so if you're aiming at an emscripten port

Ah, ok your FTEQW that runs in the browser.

I just meant that that Intel's threading library doesn't work on ARM.

Hmm, you DO know what emscripten is, right? Just in case, its a C(llvm) -> javascript compiler...

Nope. Had no idea. Had to Google it. Considering it has the word "script" in it, Google is lucky I did even that!

meh, I guess that's what I get for trying to return to the whole multithreading topic thing.

Probably Tuesday or Wednesday, this thread should make a code-oriented turn back on the main theme.

Until then, I'm going to enjoy beer! The weekends and all that ...

revelator · Post by **revelator** » Sun May 03, 2015 12:58 am

Ah ok had no idea you where developing it for ARM, but tbb works mostly for ARM also now

and theres ongoing work on it so it's worth considering.
ARM's opencv port allready uses tbb.

InsideQC Forums

(Multi-) Threading

(Multi-) Threading

Re: (Multi-) Threading

Re: (Multi-) Threading

Re: (Multi-) Threading

Re: (Multi-) Threading

Re: (Multi-) Threading

Re: (Multi-) Threading

Re: (Multi-) Threading

Re: (Multi-) Threading

Re: (Multi-) Threading

Re: (Multi-) Threading

Re: (Multi-) Threading

Re: (Multi-) Threading

Re: (Multi-) Threading

Re: (Multi-) Threading