Page 7 of 9

Re: Makaqu

Posted: Sat Feb 02, 2013 6:18 am
by mankrip
Spike wrote:[...] you can grab my qtv project ( http://fteqw.svn.sourceforge.net/viewvc ... nk/fteqtv/ ), compile it as a library, and follow the instructions given at the top of its nq_api.c file.
You'll get an additional nq 'network driver' that instead routes packets into the libqtv for translation. libqtv will do the rest.
You won't get prediction. It will only resolve certain hostnames so that regular NQ connectivity will be unaffected.
It will only work as a client, not a server.
You can play mvds with it too, but you'll likely want to knock up some extra ui/commands for that.
Thanks. Right now I'm trying to get this engine completely stable before adding any more features, but when the time comes I'll examine it.
Spike wrote:I've never run it/anything on a dreamcast, so I've no idea what changes you'd need for that.
Network multiplayer on the Dreamcast doesn't matter. There's no TCP/IP stack for it to work over dial-up, broadband adapters for the Dreamcast are rare and expensive, and playing a fast-paced game such as Quake over dial-up is unbearable. I did play both Q3A and UT through dial-up on the DC back in the day, and UT was only good to play because it had lots of prediction - which also resulted in some funny situations, like me instantly appearing in the middle of lava pits seemingly for no reason.
leileilol wrote:I had a E1M1 crash when I gibbed a soldier and a dog and one of their gibs went into my face. I did turn on lit sprites and particles and I was playing at 1920x1080.
Maybe a problem with the MDL model renderer, or with entities using it.

One thing I know for sure is that there must be buffer/pointer overflows corrupting the function call stack somewhere into the code. There's evidence of it in the funny "bugfixes" thread, and I've found other seemingly similar situations in the last months.

Also, the second demo of the hipnotic mission pack only crashes on the second loop of the startdemos, so whatever causes it is something that gets worse over time.
Dr. Shadowborg wrote:you can actually crash winquake (and presumably dosquake) by noclipping into shub and hitting the right view angle. Winquake reports it as a "Double Quake Error - R_RenderView: called without enough stack"
IIRC this was the bug caused by the movetype of the spiked teleporter ball. Makaqu has a fix for it since its early versions, and IIRC it's a fix that came from the QIP sources.

Re: Makaqu

Posted: Sat Feb 02, 2013 7:03 am
by qbism
All over the code are little temporary fixed-size buffers with lines like "byte stackbuf[1024]" that can overrun when engine limits are increased. MH found that olddata in CL_KeepaliveMessage should increase, for example. Maybe one of these is lurking somewhere.

Re: Makaqu

Posted: Sat Feb 02, 2013 11:29 am
by revelator
Hmm dynamically allocated buffers i did do some work on those in my realm engine to allow it to run marcher without setting heapsize, but there are a lot of those as you noticed.

Re: Makaqu

Posted: Wed Feb 13, 2013 4:20 pm
by leileilol
Can you take out the mouse parameter setting by default? I don't like turning off mouse acceleration in the mouse control panel after makaqu crashes.

Re: Makaqu

Posted: Fri Mar 22, 2013 7:31 am
by mankrip
Good idea, yes. I'll turn the mouse-related commandline switches into OS-specific cvars & menu options.

By the way, there have already been a few changes and experiments made since the last release. I'm still deciding on what to do for the next version.

Re: Makaqu

Posted: Fri Mar 22, 2013 4:15 pm
by leileilol
Well having a workaround like omit drawing when an aliasmodel or sprite is too close to the screen would help.

Re: Makaqu

Posted: Fri Mar 22, 2013 5:49 pm
by mankrip
I've never had any problems or crashes with sprites. As for MDL models, that could be useful.

Re: Makaqu

Posted: Fri Mar 22, 2013 5:59 pm
by leileilol
Even the original Quake had a sprite drawing crash when it got too close to the screen! The NEAR Z CLIP code doesn't seem to help, but putting a return statement with a slightly farther distance before it did.

Re: Makaqu

Posted: Wed Apr 10, 2013 8:49 am
by mankrip
I'm completely puzzled. How come this line of code is massively slower

Code: Select all

#define TURBULENCE				r_turb_pbase[ ( ( ( (r_turb_t + r_turb_turb[r_turb_s & 0x7FFFFF]       ) >> 16) & 63) << 6) + ( ( (r_turb_s + r_turb_turb[r_turb_t & 0x7FFFFF]       ) >> 16) & 63)]
... than this one?

Code: Select all

#define TURBULENCE				r_turb_pbase[ ( ( ( (r_turb_t + r_turb_turb[ (r_turb_s >> 16) & 0x7F]       ) >> 16) & 63) << 6) + ( ( (r_turb_s + r_turb_turb[ (r_turb_t >> 16) & 0x7F]       ) >> 16) & 63)]
I'm experiencing a HUGE drop in the framerate when using the first line, which has less instructions. The framerate drops from about 21 to about 6 fps. However, it should have been faster.

Re: Makaqu

Posted: Wed Apr 10, 2013 11:20 am
by Spike
less C operators perhaps.
in reality though, if r_turb_s/t is a memory operand, the shift just means that it reads a short instead of a long.
while if its a register operand then the added instruction for the shift will only cost 1 cycle, and the extra memory for the added instruction will at least partially fit inside the extra 3 bytes required for the 32bit mask value.


your real issue is that 0x7FFFFF is in the region of 8 million (more if r_turb_turb is an array of ints instead of chars/bytes). At this point you should be asking yourself how much L1 cache your cpu has. I'll help you out: not nearly that much.
'The original Pentium 4 had a 4-way set associative L1 data cache of size 8 KB'
'The original Pentium 4 also had an 8-way set associative L2 integrated cache of size 256 KB'
It also depends what else you have in memory too, like the instructions you're executing and things (so that's 2kb of your l1 gone for each separate region of memory).
We might have some awesome clock speeds nowadays, but that just means performance is more dependant upon cache speed+size than ever.
Your instructions will remain in cache the whole time. Your turb_s lookup will require 1 of your 4-way blocks, turb_t will require another, and your write to r_turb_pbase will consume the fourth. Any accesses outside of the 2k cache block will result in a cache miss. If your r_turb_t value is changing by more than 1<<11 with each iteration then you're guarenteed 2 cache misses each loop. Least-Recently-Used allocation schemes will probably result in your memory write region getting flushed at the same time (but the cpu should be smart enough to at least not flush the cache around eip).

From memory, a cache miss is about 32 cycles, and will replace part of your cpu cache resulting in more cache misses elsewhere.
A shift instruction (with register source+dest) is 1 clock.
Long story short, you've traded 2 clocks for 0-96 clocks in enough iterations of your loop, and your loop is short enough that the extra clocks are *really* noticable.

Re: Makaqu

Posted: Wed Apr 10, 2013 9:38 pm
by mankrip
Spike wrote:your real issue is that 0x7FFFFF is in the region of 8 million (more if r_turb_turb is an array of ints instead of chars/bytes). [...] If your r_turb_t value is changing by more than 1<<11 with each iteration then you're guarenteed 2 cache misses each loop.
1<<11 == 0x7FF. However, as you said, it's an array of integers, so the original value of 0x7F sounds like the highest possible already. I don't really understand how to calculate the maximum value that would be possible, and after each call of this macro there's also r_turb_s += r_turb_sstep; r_turb_t += r_turb_tstep;, so for each pixel there's at least a couple more cycles being spent already. Kernel-based dithering also includes a couple of other sums, and the translucent versions also includes a Z buffer check and iteration:

Code: Select all

#define DITHERED_TURBULENCE_A	r_turb_pbase[ ( ( ( (r_turb_t + r_turb_turb[ (r_turb_s >> 16) & (CYCLE - 1)] + XY0a) >> 16) & 63) << 6) + ( ( (r_turb_s + r_turb_turb[ (r_turb_t >> 16) & (CYCLE - 1)] + XY1a) >> 16) & 63)]

#define DITHERED_BLEND_A(i)				if (pz[i] <= (izi >> 16)) r_turb_pdest[i] = colorblendingmap[ (DITHERED_TURBULENCE_A << 8) +  r_turb_pdest[i]      ];

Code: Select all

DITHERED_BLEND_A(-16); izi += izistep; r_turb_s += r_turb_sstep; r_turb_t += r_turb_tstep;
My purpose was to make the texture turbulence smoother, at no extra processing cost. I've managed to adjust the rest of the code to use the 0x7FFFFF value and the results were perfect, but since there's no way to do that without slowing down the code, I'll scrap that idea. I'll keep this experimental code disabled with #if 0 for anyone who wants to take a look, though.

Re: Makaqu

Posted: Thu Apr 11, 2013 5:00 am
by Spike
there are other ways to do a sine table. you may find that calculating the value will be faster.
Or you can compact the sine table to 1/4th and utilise the fact that it repeats upon itself with different sign/direction within each 1/4th period.
Or you can just interpolate. Its got to be better than the rampant cache misses.

Re: Makaqu

Posted: Thu Apr 11, 2013 7:30 pm
by mankrip
Interpolation should be the easiest for me to implement. But it would slow down the rendering a bit, so it isn't a priority for me. The dithering smooths the liquids out a lot already.

By the way, the sky spans suffers from a similar inaccuracy. Their distortion is only recalculated on every 32 pixels, so there's some noticeable tearing when zooming in. That could use some interpolation, but fixing that isn't too important either.

Re: Makaqu

Posted: Fri May 03, 2013 7:01 pm
by mankrip
Now I'm implementing Ogg Vorbis support, and this is how it should turn out:

- Both the Ogg library and the Vorbis library are being compiled as part of the engine, directly from their sources. No external libraries, and no static libraries either.
- OGG sound effects should be fully uncompressed into RAM upon loading, to prevent negative impacts on the framerate.
- OGG sound effects may not loop.
- OGG music will be processed through Quake's audio mixer, to make it easily portable.
- Due to the OGG music being processed through Quake's audio mixer, the default samplerate will be changed to 44100 Hz.

Re: Makaqu

Posted: Sun Jul 21, 2013 7:30 pm
by mankrip
Got quickly bored with the OGG code and stopped development a couple months ago.

Now I'm simplifying the MDL renderer: Removed its x86 ASM codepath (because it's incompatible with some planned changes), and removed the old non-interpolated codepath (I'm going to make "r_interpolation 0" set the interpolation's interval to zero instead). These removals were also performed to make the code maintenance easier.

I've also implemented some code to draw a box around each MDL model's drawing area, and found out that sometimes, a model's drawing area is being shared with another model. It seems that their triangles' indexes are leaking. I already suspected that there is a leak somewhere in the MDL model renderer, and hopefully it's this one. Now it should be easier to figure out how to fix.