Minimum CPU while hitting target FPS

Baker · Post by **Baker** » Tue Nov 11, 2014 7:59 pm

If I understand vsync right, it makes SwapBuffers a blocking call.

Any ideas on how to minimize CPU usage using Sleep (1) or usleep, but always --- or almost always --- hitting the frames per second target?

1) My understanding of blocking SwapBuffers is that it is a wasteful and essentially a while loop.
2) I also am under the impression that hitting the vsync timing is touchy business.
3) And Sleep(1) on Windows --- even though it should be "milliseconds", consumes an unknown amount of time.

Or is my best bet to estimate sleep intervals and implement an extra timer or 2 to pull this off by measuring and averaging.

ericw · Post by **ericw** » Tue Nov 11, 2014 8:56 pm

For 1) SwapBuffers is that it is a wasteful and essentially a while loop, It's not a spinloop on OS X (just tried quakespasm with vsync on, in e1m1, the app shows up in Activity Monitor as using 6% cpu, and Instruments shows most of the execution time is spent in the SDL SwapBuffers call.) I'd be surprised if SwapBuffers with V-sync was implemented as a spinloop, this would pretty much force your app to take 100% cpu. Are you seeing increased cpu usage with vsync?

My impression is, as long as your frame runs in less than (1/60) seconds (assuming 60Hz monitor), it will "just work" - your SwapBuffers call will happen before the vsync, the driver waits the needed amount of time, then hands control back to your app. If you take longer than (1/60)s for a frame, you miss the vsync you were aiming for, but you'll get the next one - so the last frame lasts for (2/60) seconds, i.e. the framerate drops to 30fps briefly.

I don't know of a good source that explains vsync so the above is kind of guesswork.. maybe someone will have a pointer

Spike · Post by **Spike** » Tue Nov 11, 2014 9:12 pm

I wouldn't worry about it. if you try doing other stuff before waking up, stuff that depends on user input, all that happens is that you increase latency.
if you have the appropriate settings configured, it'll be a true sleep instead of a busyloop. its the driver's choice, and power/performance settings can affect things here.
you can use a worker thread or whatever, then you'll be able to do stuff during the wait, but I'm not sure what stuff there will actually be. Like I said, to minimise latency and maximize smoothness, you'll want to poll inputs as soon as the vsync wait is over and then update the screen with those inputs ready for the following vsync. Basically, don't worry too much about it.

If you're struggling to achieve 60fps, firstly reconsider how much load you're putting on the cpu and consider shifting some more to the gpu.
Secondly, consider using https://www.opengl.org/registry/specs/E ... l_tear.txt as this will avoid suddenly hitting 30fps quite so much when you're suddenly only pulling 59 fps instead of 60.

mh · Post by mh » Wed Nov 12, 2014 2:10 am

Most Quake engines are CPU bound.

By "most" I mean those that restrict themselves to a similar OpenGL level that the original GLQuake did, of course.

The reality is that the OpenGL level that GLQuake used is horribly inefficient with modern hardware. And I don't just mean GPUs, I mean the entire package. If you're going to restrict yourself to that level, and if you're going to use the same set of OpenGL calls, then by definition you're going to be CPU bound and by definition you're going to have high CPU usage.

You need to get out of that mindset and start accepting that you're targetting hardware that no longer exists. Trying to micro-optimize elsewhere is a mug's game. In an ID1 timedemo the bottleneck is the renderer.

You need to aim to get 100% (or as close as possible) GPU usage when running flat-out. I don't mean in the general case, I mean that when running a timedemo with vsync off your GPU should run at 100%. Unless and until you get to that level you shouldn't even be thinking about "how do I manage CPU usage" because your CPU usage is already inefficient to begin with.

On any kind of reasonably decent modern hardware Quake should start and finish (including SwapBuffers, no vsync) drawing a typical ID1 scene in well under 1ms. If you're not hitting that kind of performance then you've too much loaded on the CPU. So get your render under 1ms first, then start worrying.

In other words - get yourself to the stage where the overwhelming majority of the time your engine is running, it's actually doing nothing. As soon as you know you're doing nothing most of the time, you can start thinking about doing something useful - like yielding CPU - during that time. But until then you've got a 13.something millisecond time window to get stuff done in, so even individual milliseconds are important; don't waste them with ancient crap.

Once you've sorted out the bottleneck, then you can start looking at methods to reduce CPU usage in the general case. Something as simple as a Sleep (1) when Host_FilterTime returns false will work well then. You can tune this so that if there's less than, say, 2ms until the next frame should run (based on maxfps and vsync settings) you just run flat-out until then, and still get substantial savings, because you already know that your frames run so fast.

The moral of the story is: don't start thinking about reducing CPU usage with Sleep/etc yet; you've got inefficient CPU usage by definition to begin with, so resolve those inefficiencies first. Then start saving time.

Baker · Post by **Baker** » Wed Nov 12, 2014 11:00 am

ericw wrote:For 1) SwapBuffers is that it is a wasteful and essentially a while loop, It's not a spinloop on OS X (just tried quakespasm with vsync on, in e1m1, the app shows up in Activity Monitor as using 6% cpu, and Instruments shows most of the execution time is spent in the SDL SwapBuffers call.) I'd be surprised if SwapBuffers with V-sync was implemented as a spinloop, this would pretty much force your app to take 100% cpu. Are you seeing increased cpu usage with vsync?

I did some groundwork Googling to try to determine how SwapBuffers works on Windows. I'm summarizing what I interpreted as the explanation right or wrong.

Some of the information was rather old, hence not really fully trusting this information nor being able to find current info --- I posted here.

[Still reading ...]

InsideQC Forums

Minimum CPU while hitting target FPS

Minimum CPU while hitting target FPS

Re: Minimum CPU while hitting target FPS

Re: Minimum CPU while hitting target FPS

Re: Minimum CPU while hitting target FPS

Re: Minimum CPU while hitting target FPS