PF_VarString crash bug (and fix)

Post tutorials on how to do certain tasks within game or engine code here.
Post Reply
mh
Posts: 2292
Joined: Sat Jan 12, 2008 1:38 am

PF_VarString crash bug (and fix)

Post by mh »

There's a bug in PF_VarString that occasionally occurs when exiting a map, and which can cause Quake to crash. This is caused by a float in the globals table being set to a NaN, and can be triggered by compiler settings or by removing the ASM code.

So let's fix it. ;)

As it really only happens when exiting a map we'll just swallow it silently rather than implementing a more rigourous fix - why do more work than you have to?

Before:

Code: Select all

char *PF_VarString (int	first)
{
	int		i;
	static char out[256];
	
	out[0] = 0;
	for (i=first ; i<pr_argc ; i++)
	{
		strcat (out, G_STRING((OFS_PARM0+i*3)));
	}
	return out;
}
After:

Code: Select all

char *PF_VarString (int	first)
{
	int		i;
	static char out[256];
	
	out[0] = 0;
	for (i=first ; i<pr_argc ; i++)
	{
		float fvarstring = pr_globals[(OFS_PARM0+i*3)];

		// check for a NaN
		if (fvarstring != fvarstring)
		{
			out[0] = 0;
			return out;
		}

		strcat (out, G_STRING((OFS_PARM0+i*3)));
	}
	return out;
}
Only took me 4 years to track that one down! :lol:
We had the power, we had the space, we had a sense of time and place
We knew the words, we knew the score, we knew what we were fighting for
Spike
Posts: 2914
Joined: Fri Nov 05, 2004 3:12 am
Location: UK
Contact:

Post by Spike »

hrm?
while I agree that a nan passed to a function that expects a string will cause an invalidly read string resulting in a crash, pr_argc is set by the op_call instruction. if pr_argc is wrong, then this can have only happened by using a buggy qcc, or buggy builtin prototypes. In terms of qcc bugs, nested function calls are the most likely way of getting a float where there should be a string parameter, nan or otherwise.

The solution presented above is a workaround for a specific instance, rather than a true fix. You can still have large floats (large exponants) that trigger the same error.
I don't know which line of QC is responsible for the error you report, but if it were recompiled with frikqcc, fteqcc, or qfcc, the issue would likely be fixed.

Perhaps worse is that a NaN is represented by the float containing the bits 0x7f800000 all set, as well as random other bits. This means that a negative NaN will be detected in the case where the string in question is within 8mb beneath the progs' string table on the heap. This includes classnames read from the bsp and stored in the zone memory within the heap.
At least I think that's how the maths works out.
Basically you can no longer print classnames if the engine its ported to still uses an 8mb heap. There's also another block of memory that cannot be poked, but that is a region of memory that would never normally be usable to store such a string in windows or linux.
A more correct test would perhaps be to fix the engine to store qc strings within only the heap (player netnames come to mind), and to then ensure the resulting string pointer is within that range. This would also have the benefit of fixing the qcvm on 64bit archetectures, assuming the heap is no larger than 2gb (depending on where the progs stringtable is located within the heap)...

A bigger concern to me is mods that send aprox 1024-char centerprints like teamfortress. Not only will that crash an unmodified client due to stack overflows/corruption, but it will also overflow the server's data section. 256 chars is just too small for too many centerprint-abusing mods.

Imho.
mh
Posts: 2292
Joined: Sat Jan 12, 2008 1:38 am

Post by mh »

No issues with any of that.

It's actually the original ID1 progs.dat, and I've observed it crashing on straight ports of the original code (both Win and GL) to MSVC 2008, as well as on ports of modded engines. Odd, huh?

Obviously a deeper VM bug is at work here, but like I said, it's not a rigourous fix. Just a "swallow the error and let the engine keep running" thing. ;)

The problem is that pr_globals (which is float *) gets a NaN in it, which then messes with G_STRING (pr_strings + *(string_t *)&pr_globals[o]), causing havoc in PF_VarString by translating to a negative array index for pr_strings[]. I haven't bothered investigating further for the simple reason that my fix is "good enough for now".
We had the power, we had the space, we had a sense of time and place
We knew the words, we knew the score, we knew what we were fighting for
Spike
Posts: 2914
Joined: Fri Nov 05, 2004 3:12 am
Location: UK
Contact:

Post by Spike »

/me experiments
grr silly glextensions limit bug
eep what was that paletted texture disable thing again
GAH MOUSE ACCELERATION, WTF
ffs alt tab made it go black

okay... pr_temp_string has a value of 0xfd10xxxx or so.
its randomized slightly as msvc's default settings is to randomize load addresses as a security feature.
traditionally the malloc heap is allocated in virtual address space after the exe.

Seems the progs is the first thing on the heap after the reset mark, so your classnames are safe.
However, the svs.clients array is allocated at the start of the heap, before the level reset mark:
*(float*)&host_client->edict->v.netname -1.#QNAN00 float
ouch.

So yeah, the heap is malloced. A negative nan can come from any memory within the 8mb beneath the progs' string table. This includes from malloc calls which were later freed. Its possible that the string in question is from a malloced buffer which is changed before its printed, but I have no idea how that could occur. The only strings from outside the string table that I can see are ent/bsp-file strings(later heap=safe), netnames(earlier heap=nans), pr_string_temp(data section=almost nans) and sv.name(data section=almost nans).

So yeah, I have absolutely no idea how you're getting a nan that would actually crash it, but I can see how you'd get a nan when the string is still valid.
You must have either a corrupt string within the QC code, corrupt function calls resulting in an argument from the wrong invocation, or a string that was once valid but has since been free()d (Z_Free doesn't count).

Remind me sometime to make a version of fteqcc that warns whenever two function calls happen on the same line of code (except when used as the first argument) as a buggyqcc compat test.
r00k
Posts: 1111
Joined: Sat Nov 13, 2004 10:39 pm

Post by r00k »

Hey Spike,
can you "make a version of fteqcc that warns whenever two function calls happen on the same line of code (except when used as the first argument) as a buggyqcc compat test?"

;)
mh
Posts: 2292
Joined: Sat Jan 12, 2008 1:38 am

Post by mh »

Here's an ID Quake build that reproduces it: WinQuake NaN Build

It's fairly minimally modified; just enough to stop it going completely berserk on Intel hardware. Every single map transition triggers a NaN in PF_VarString and if it happens it's printed to the console, so just hit Single Player | New Game when the Necropolis demo starts, bring down the console and have a look. This is with stock ID1 progs.dat and the code was ported to MSVC 2008, confirmed happening on 2 different PCs so far.
_______________________

OK, more research. The criminal here is OP_LOAD_S; it's writing a value into c->_int that the eval_t union translates to a NaN in c->_float, and it happens during PR_ExecuteProgram (pr_global_struct->ClientConnect);
_______________________

Update 2:

Code: Select all

ent->v.netname = host_client->name - pr_strings;
*((float *)&ent->v.netname) is the NaN.
We had the power, we had the space, we had a sense of time and place
We knew the words, we knew the score, we knew what we were fighting for
Spike
Posts: 2914
Joined: Fri Nov 05, 2004 3:12 am
Location: UK
Contact:

Post by Spike »

as I said earlier:
*(float*)&host_client->edict->v.netname -1.#QNAN00 float

player names are stored within the heap, allocated at init time, the pointer used to reference them is svs.clients which persists through all map changes.
The progs is always loaded after the initial heap, in an area which is always cleared out. There is no map info between the clients list and the progs string table.
This means that (svs.clients[foo].name - pr_strings) will always come out as a negative value, and one which is fairly small, typically less than -8mb. The bit pattern of a negative number has the highest bits all set to 1, which is the same pattern as a negative nan (highest 9 bits are 1).
On every map change, you get a bprint(self.netname);bprint(" exited the level\n"); and self.netname is a value between -8m and -1 and is thus always going to be detected as a negative nan.
But it is required behaviour - that particular nan is perfectly safe.
mankrip
Posts: 924
Joined: Fri Jul 04, 2008 3:02 am

Re: PF_VarString crash bug (and fix)

Post by mankrip »

I've implemented mh's fix from the first post, but as Spike said, the obituary messages stopped displaying the player name, so I disabled the fix.
Ph'nglui mglw'nafh mankrip Hell's end wgah'nagl fhtagn.
==-=-=-=-=-=-=-=-=-=-==
Dev blog / Twitter / YouTube
Baker
Posts: 3666
Joined: Tue Mar 14, 2006 5:15 am

Re: PF_VarString crash bug (and fix)

Post by Baker »

mankrip wrote:I've implemented mh's fix from the first post, but as Spike said, the obituary messages stopped displaying the player name, so I disabled the fix.
Thanks for the feedback on this letting us know that in addition to the discussion above that it has an issue.

re: Centerprint conversation ...

Interesting stuff. I know the X-Men mod in single player does centerprint way too long. In fact, I think I only discovered this after making some ProQuake changes to protect against buffer overrun of strings and then someone complaining about the message not showing the whole thing after the fix.
The night is young. How else can I annoy the world before sunsrise? 8) Inquisitive minds want to know ! And if they don't -- well like that ever has stopped me before ..
andrewj
Posts: 133
Joined: Mon Aug 30, 2010 3:29 pm
Location: Australia

Re: PF_VarString crash bug (and fix)

Post by andrewj »

The "fix" in this tutorial has been shown to be bogus, and hence this thread should be renamed and moved out of the tutorials section.
mh
Posts: 2292
Joined: Sat Jan 12, 2008 1:38 am

Re: PF_VarString crash bug (and fix)

Post by mh »

I think "bogus" is too strong a word to use here.

It's still possible to reproduce the bug with certain build configurations, and from unmodified source-as-released, so something is wrong somewhere. My initial analysis of cause and solution has definitely been shown to be premature and flawed, no contest there, so retention of the thread with the addition of a "don't actually do this" warning (which purpose I guess this post serves) is probably a good idea - at the very least if further analysis throws up the real cause and solution it can be added at a future date and be of benefit to the community.
We had the power, we had the space, we had a sense of time and place
We knew the words, we knew the score, we knew what we were fighting for
Post Reply