Mundane C tricks ...

taniwha · Post by **taniwha** » Sat Dec 22, 2012 12:40 pm

I myself am not sure how exactly to describe protocols, but if you attach protocols to a class when declaring it, you tell the system that this class will accept the messages that form the protocol. Even without then declaring those methods in the class's interface, the compiler will give a warning if you forget to implement them. Code can query the system about whether an object conforms to/implements a protocol and the system needs to check only the protocol list, not the methods.

I guess one way of describing protocol methods is as pure virtual functions in an abstract class that does not exist. This is important because Objective-C does not have multiple inheritance, but it does let you attach multiple protocols to a class, thus effectively allowing your class to "inherit" multiple abstract classes. For real inheritance, any the derived class automatically conforms to any protocols implemented in the base class.

Categories are really funky: they allow you to extend a class without creating a new class: all the methods and protocols declared in the category become available to the rest of the system via the original class, even without access to that class's source code. One of the examples I've seen is to add the ability for a class to print itself. The original class has no concept of printing and if you pass it to the printing system, you get either a no-op or an error message. Using a category to add printing, you can then pass the exact same class to the printing system and the class will be printed.

One of the reasons for @ is to keep the name-space clean. ie, implementation, end, class etc remain available for use as identifiers. @"somestring" is short-hand for creating a string object from a string constant. ie @"string" is the same as [NSString fromString:"string"] (or something like that, I don't know the details off-hand). Objective-C is funny in one way: any word is usable in selector names (keywords, typedefs, other class names...), eg -acceptMessage:(SEL) msg for:(id) obj; (though forObject would be more informative, of course).

The reason I added Objective-C to qfcc was rhamph was complaining about the utter mess of overloaded entity fields in CustomTF (prozac flavor, iirc), and I first added structs and unions. After a lot of discussion in #quakeforge, it was decided that objects would be nice, but I dreaded the idea of trying to add C++ extensions to QC. Deek suggested Objective-C, I got hold of a good pdf specification of the language (possibly through Deek), and decided it looked easy enough. I committed the first stab at the parser mods 7th of May, 2002, and and basic test code running 31st of the same month. Parser, code generation, structure writing for the runtime information, the runtime system itself, and a zillion little changes in the engine and compiler to support it all. However, I'm still fixing bugs in it

(oversights, not yet implemented features, mis-interpretations, etc), but overall, it's been more than good enough.

One cool feature of of QF's runtime is that the server checks for a field named ".this" (@this in qc source: just use it, auto-declaration equivalent to ".id .this"

) and if present, passes the value stored in the entity's .this field as the first parameter, nil as the second and other (if relevant) as the third to the appropriate think/touch/blocked function. One would then assign the object's pointer to self.@this (or better, @self.@this (so Objective-C's self doesn't clash with QC's self)) at spawn time, and the set @self.think to the IMP of the appropriate messge.

server code:

Code: Select all

static inline void
sv_pr_think (edict_t *self)
{
    pr_int_t    this;

    *sv_globals.self = EDICT_TO_PROG (&sv_pr_state, self);
    *sv_globals.other = 0;
    if ((this = sv_pr_state.fields.this) != -1) {
        PR_RESET_PARAMS (&sv_pr_state);
        P_INT (&sv_pr_state, 0) = E_POINTER (self, this);
        P_INT (&sv_pr_state, 1) = 0;
        P_INT (&sv_pr_state, 2) = 0;
    }
    PR_ExecuteProgram (&sv_pr_state, SVfunc (self, think));
}

example from QF's version of fbxa:

Code: Select all

        local IMP imp = [self methodForSelector: @selector (waypointThink)];
        waypoint_thinker.think = (void ()()) imp; //if this was C, that would be void (*)() yay for qc function variables
        waypoint_thinker.nextthink = time;
        waypoint_thinker.@this = self; //NOTE this is the method's self, not "entity self" which would be "@self"

(comments added for this post)

Code: Select all

    @self = spawn ();
    @self.origin = origin;

Using @self creates "entity .self" and qfcc's linker will generate a warning if it sees both .self and self (def_warning (d, "@self and self used together")).

Heh, this post really should be in another thread: we're way beyond "mundane C tricks" and into "esoteric QC and engine hacks"

Baker · Post by **Baker** » Sat Dec 22, 2012 8:53 pm

taniwha wrote:Heh, this post really should be in another thread: we're way beyond "mundane C tricks" and into "esoteric QC and engine hacks"

Ok ... back to the basics ... is there a way I can construct a union to allow referencing XYZ both as:

1. vec3_t ... aka float[3]
2. AND as .x .y .z

I like using vec3_t about 80% of the time. The other 20% of the time doing something like point.x = something, point.y = something, point.z = something can be advantageous.

With 2D, this probably ends up hitting the reverse, 80% of time .x and .y is best representation and 20% a vec2_t (aka float[2]) is preferable.

Baker · Post by **Baker** » Sat Dec 22, 2012 9:14 pm

Seems this isn't hard ...

Code: Select all

typedef struct
{
	union
	{
		float vec3[3];
		struct
		{
			float x;
			float y;
			float z;
		};
	};
} vector3;

vector3 test = {5,6,7};
test.x = 4;
fprintf (stdout, ".xyz %4.2f x %4.2f y %4.2f z", test.x, test.y, test.z);

// .xyz 4.00 x 6.00 y 7.00 z

fprintf(stdout, ".vec3 %4.2f x %4.2f y %4.2f z", test.vec3[0], test.vec3[1], test.vec3[2]);
// .vec3 4.00 x 6.00 y 7.00 z

Bonus: sizeof(vector3) = 12 (3 floats x 4 bytes per float = 12)... which tells me --- and I have rarely used unions and not written one until now --- that isn't increasing the size of struct in any way. Not that sizeof had to return 12, I wouldn't have been entirely shocked if it returned 16 because you never know when padding of structs kicks in.

Baker · Post by **Baker** » Sat Dec 22, 2012 10:17 pm

Code: Select all

typedef struct
{
	union
	{
		vec2_t vec2;
		struct
		{
			float x;
			float y;
		};
	};
} Point2D;

Code: Select all

typedef struct
{
	union
	{
		struct
		{
			float Left;
			float Top;
			float Right;
			float Bottom;
		};
		struct
		{
			Point2D topLeft;
			Point2D bottomRight;
		};
	};
} Rect2D;

I'm almost surprised this worked:

static Rect2D somerect;
somerect.Left = 1;
somerect.Top = 2;
somerect.Right = 3;
somerect.Bottom = 4;
fprintf (stdout, "Rect2D: left = %4.2f top = %4.2f right = %4.2f bottom = %4.2f \n", somerect.Left, somerect.Top, somerect.Right, somerect.Bottom);
fprintf (stdout, "Rect2D: topleft.x = %4.2f topleft.y = %4.2f bottomRight.x = %4.2f bottomRight.y = %4.2f\n ", somerect.topLeft.x, somerect.topLeft.y, somerect.bottomRight.x, somerect.bottomRight.y);

And not 100% that this compiler-independent ... weird stuff like packing order or alignment stuff?

taniwha · Post by **taniwha** » Sun Dec 23, 2012 3:03 am

Structure order (both union and struct) is guaranteed to be the same in any compiler. For union, the offset is always 0 and the size of the union is the size of its largest field. For struct the offset is guaranteed to increase with member offsets being in the declared order and with no overlap. I believe even with alignment, there will be no gaps between fields of the same type. The issue comes when mixing types in the struct. Also, endianess makes a mess of bit fields. Otherwise, struct is very reliable, and union is a non-issue.

Your design for your vector structs is spot-on.

Spike · Post by **Spike** » Sun Dec 23, 2012 3:20 am

you only get padding if your data types are different sizes. structs are padded to the alignment of the largest member.

pointers and 64bit ints have undefined alignment and/or size requirements (depends upon the compiler).
floats/ints/shorts/bytes will have consistant alignment in any platform you're likely to develop for, with the exceptions of 16bit dos (use gcc and/or dos extenders) or old versions of windows ce (predating 'windows mobile' versions), both of which are obsolete.

longs are a wildcard, and should generally be avoided nowadays.

if you really care about your datatype alignments, use stdint.h though beware that its not part of c89, and that some (really obsolete) systems simply don't have allocation units that are a multiple of 8.

There are also some systems that are pure 32bit and don't support bytes/shorts. Not sure how well stdint.h works there, but there's no reason it can't work effectively enough, only that it won't be atomic, so beware of threading issues.

Other than that, if you stick to the rules of using all-same datatype sizes, and anticipate padding on some systems if you use pointers/64bit datatypes, then there's really no issues.
at least no issues other than endian... Good luck with that one.

taniwha · Post by **taniwha** » Sun Dec 23, 2012 4:04 am

For the issue you were worried about (sizeof returning 12 vs 16), I think that depends on the alignment of the individual fields. Iirc, sizeof (type) returns the byte distance between elements of an array of type.

Try this out in your favorite compiler (of course, things will go funny if sizeof (int) == sizeof (short)):

Code: Select all

#include <stdio.h>

typedef struct {
    short a;
    short b;
    short c;
} StructA;      // expect sizeof 6

typedef struct {
    int   a;
    short b;
} StructB;      // expect sizeof 6, but...

StructA arraya[2];
StructB arrayb[2];

int
main ()
{
    printf ("sizeof (StructA): %d\n", sizeof (StructA));
    printf ("sizeof (StructB): %d\n", sizeof (StructB));
    printf ("sizeof (arraya): %d\n", sizeof (arraya));
    printf ("(char *) &arraya[1] - (char *) &arraya[0]: %d\n",
            (char *) &arraya[1] - (char *) &arraya[0]);
    printf ("sizeof (arrayb): %d\n", sizeof (arrayb));
    printf ("(char *) &arrayb[1] - (char *) &arrayb[0]: %d\n",
            (char *) &arrayb[1] - (char *) &arrayb[0]);
    return 0;
}

This is the output for gcc on 64 bit linux (should be the same for 32 bit linux)

Code: Select all

sizeof (StructA): 6
sizeof (StructB): 8
sizeof (arraya): 12
(char *) &arraya[1] - (char *) &arraya[0]: 6
sizeof (arrayb): 16
(char *) &arrayb[1] - (char *) &arrayb[0]: 8

Baker · Post by **Baker** » Sun Dec 23, 2012 12:21 pm

Spike wrote:longs are a wildcard, and should generally be avoided nowadays.

I think I've run into that first hand ... something like on 64-bit long is 64-bit, versus long as 32-bit on 32-bit.

And it is a bit aggravating that some rather standardized legacy code tends to use longs, doesn't affect the code ... affects interacting with the functions themselves. Since I don't want to declare a long variable in my own code just for the sake of calling a function that expects a long.

And that function *doesn't* really want a long --- it wants SInt32 because the code was written when long = 32-bit.

I've come the probably correct conclusion that long vs. short vs. "int" are mostly 386/486 era datatypes back when the world was transitioning from 16-bit to 32-bit. Still eventually I'm sure the standard will change ... or maybe not. Only with math, clocks and memory addressing (size_t, etc.) does the use for super-huge numbers really kick in ...

For the issue you were worried about (sizeof returning 12 vs 16),

With Spike's explanation above, padding is less of a mystery now. I never knew when padding would kick in so it seemed somewhat random to me and nothing really would have surprised me ...

Spike · Post by **Spike** » Sun Dec 23, 2012 2:54 pm

'int' is the standard datatype that is native for the machine, so 16bit if you're on a 16bit machine, 32bit if you're on a 32bit machine, and 32bit if you're on a 64bit machine... wait... that doesn't work... meh.

'long' is sometimes emulated, because 16bit ints are too much of a pain.

'size_t' is generally a uintptr_t, but can also be 16bit in a segmented system like dos, where far pointers are 32bit (although with only a 20bit address space, with random weirdness mapping to the same bits of memory). Its an efficient datatype, basically.

'long double' is a fun datatype... 80 bits of actual data on x87, 64bits of actual data with sse. its just a fun datatype.

if you're doing professional coding, get accoustomed to typing uint32_t sooner rather than later.

At the end of the day, you'll either have data structures that are internal to your code (in which case you shouldn't really care about alignment/padding etc anyway), data structures that are part of an API (where extensibility is likely more important), or data structures that are transfered between programs (where you should be checking variables are within expected parameters and extra padding can be fatal, or leak information that should not be leaked).

If its buggy on 64bit, run it through valgrind. That'll detect any reads of uninitialised padding for you nice and easy. In certain situations, memsetting everything to 0 can be bad.

Baker · Post by **Baker** » Sun Dec 23, 2012 4:07 pm

Spike wrote:'In certain situations, memsetting everything to 0 can be bad.

I tried to think of a scenario where this could be true ... couldn't come up with a scenario. What's an example?

revelator · Post by **revelator** » Mon Dec 24, 2012 8:20 am

There where a few Places in Doom3 that memset class members to 0 like the ase models,
but doing so caused some ase models to go black because it also nulled out some class members that should not have been. mh suggested a fix and it Works quite nicely

taniwha · Post by **taniwha** » Mon Dec 24, 2012 8:29 am

For memsetting to 0: certain machines have non 0-bit representations of 0 (null pointers and floating point), but my understanding is that there'd be more trouble than that for quake on such machines.

The only guarantees for type sizes are char has enough bits to represent source code, and then long >= int >= short >= char, which means long == int == short == char is perfectly valid. On the other hand, apparently you can trust float: ieee spec. Maybe double, not sure, but long double is a wildcard again.

It seems even big-endian is going the way of the dodo, so I'm not sure it's worth being overly worried about memsetting to 0 (though good to keep in your mind's deep storage just in case you find yourself working with such a machine).

For reckless' example, that's not really memset 0 being bad, but just not complete: memset to 0, then do specific initializations. The memset is to ensure the whole struct is in a known state, the specific sets to non-zero is to correct that known state for fields that need to be non-zero.

Spike · Post by **Spike** » Mon Dec 24, 2012 8:56 am

well clearing out the vtable of a class is obviously bad... But that's more an issue with the 'new' operator than anything else.

A more serious example is when you're updating your code a little. Say you have 5 places where you spawn some type of object, and you want to add an extra field to these objects, that doesn't have a 'default' value of 0 (but 0 doesn't crash anything, at least not immediately).
You can easily forget to update at least one of those places to set it to 5 or whatever, and the behaviour of those other bits of code is now not what was originally intended.
If its not memcleared, you can run it in valgrind and have valgrind point out the bits of memory that are being poked without having been set first, thus drawing your attention to the ommision.
Though its not something you should rely upon - developing in the debugger is generally frowned upon.

While you can trust the ieee float spec, there's no guarentee that your compiler/cpu actually complies with it.
Yes, there are some machines with negative-zero integers too... And machines with 7bit or 9bit bytes, or 32bit 'bytes'. The paculiarities of the archetecture are important for speed, but any semi-mainstream system will at least attempt to support most common libraries without too much rewriting.
Little endian may be backwards, but at least its a bit more extensible in that the trailing part can be masked instead of having to be offset. There's enough awareness of bigendian that it'll live on in niche systems (like MIPS, powerpc, routers, and supercomputers), and any differences there really affect only file formats (network endian is big endian). Well written programs will still work properly, they might just use a different byte order. Yay.

While 0 can logically be a valid value for a pointer, systems that don't use 0 for NULL will have huuuge problems with C++. Its just not going to happen on any non-embedded system (I don't even count phones as 'embedded' any more).

revelator · Post by **revelator** » Mon Dec 24, 2012 9:17 am

Maybe not the best example

there was one place though where the differences between C and C++ came to pass as an inherently bad way of using memset.

it was organized a bit like this

someclassmember dummy;

memset (&dummy, 0, sizeof(someclassmember));

whoops we just nulled out the entire row of class members

in C say like this

entity_t dummy;

memset(&dummy, 0, sizeof(entity_t));

which is valid though maybe not the best way to do Things.

it gave every static code analyzer i tried on it hickups and the memset was removed in a later patch

Visually it didnt seem to do much if anything but removing the memset worked just as well atleast in C++ while C would spew warnings about an uninitialized function if i did not memset the dummy.

taniwha · Post by **taniwha** » Mon Dec 24, 2012 10:00 am

If you're spawning and initializing your objects in 5 different places, you're doing it wrong

(*attempts to scrape more char from fingers*). I've found that even testing objects in 5 difference places is doing it wrong. That's why qfcc has a slowly growing population if things like "is_void()" (really should be "type_is_void"), "statement_is_goto", "def_operand", "new_value_expr"...: I both got tired of having to look up how to init/test and all the typing involved (even though is_void is just type->type == ev_void). For qfcc speed is not an issue, so I have such functions in the relevant .c files, but in an engine, static inline functions in header files would be more appropriate.

The currently supported MIPS port of QF is little-endian. In fact, only the ps3 ppc port is big-endian. (port = cross building scripts in tools/cross).

I believe the C spec guarantees that no matter what the machine encoding of NULL is, the source "encoding" shall be 0 (for assignment and testing). I believe memset is another matter. I guess the truly paranoid can memset the struct to 0, then explicitly set the pointers to 0.

InsideQC Forums

Mundane C tricks ...

Re: Mundane C tricks ...

Re: Mundane C tricks ...

Re: Mundane C tricks ...

Re: Mundane C tricks ...

Re: Mundane C tricks ...

Re: Mundane C tricks ...

Re: Mundane C tricks ...

Re: Mundane C tricks ...

Re: Mundane C tricks ...

Re: Mundane C tricks ...

Re: Mundane C tricks ...

Re: Mundane C tricks ...

Re: Mundane C tricks ...

Re: Mundane C tricks ...

Re: Mundane C tricks ...