Calling Conventions/Stack

Baker · Post by **Baker** » Mon Jun 12, 2017 5:41 am

Here is a great guide to structure of the stack and what goes on in a function call.

http://www.tenouk.com/Bufferoverflowc/B ... low2a.html (detailed function call)
http://dwarfstd.org/DownloadDwarf5.php (how debug information for gdb is stored)
https://en.wikipedia.org/wiki/X86_instruction_listings (x64 instruction set)

About 2 years ago, I would read the pros and cons of the different language calling conventions (caller cleanup, callee cleanup) and dig into the stack structure, but not really/just barely use it.

Weird thing is, earlier in the year I turned the Quake menu as-is into a mouse interactive menu.

https://www.youtube.com/watch?v=pWyFltQ-wRY

But the interface was hard-coded into the engine and rubbed me the wrong way, because I felt like the interface should be script-like (text representation).

But to get from hard-coded representation to a script-like representation that allows variable declaration and function calls and complex expressions is no small amount of work.

In fact, it involves 4-5 different highly complex problems.

But why else do something? I would imagine this is why FrikaC, JP Grossman, Spike and others found satisfaction in the QuakeC compiler.

revelator · Post by **revelator** » Mon Jun 12, 2017 6:47 pm

Could use cegui

completle scriptable via lua and xml., used in atleast one game im aware of that uses the crystal engine.

Pro's
Scripting via lua xml and possibly others i have to have a look at it again been some years.
Gui elements can be locked unlocked which is a big help when setting up ingame stuff.
Could also be used for hud and other stuff.

Con's
Library is C++ but C wrapping is present, and it can be linked statically so not really a problem.
A rather big overhead in size.

ericw · Post by **ericw** » Mon Jun 12, 2017 6:56 pm

Have you checked out libffi Baker? This lets you call compiled code at runtime given the function name as a string.
https://en.wikipedia.org/wiki/Libffi

That said, IMHO reflection should be considered harmful(TM) in most cases, usually there is a statically typed way to get the same result.

If you want to call menu functions at runtime based on a string, you could make a table of strings and function pointers.

Code: Select all

typedef struct menu_func_s {
    char name[64];
    void (*function_pointer)(void);
} menu_func_t;

menu_func_t funcs[] = {
    { "foo1" , &Foo1 },
    { "foo2" , &Foo2 },
};

Can also replace the

Code: Select all

{ "foo1" , &Foo1 },

bits with a macro to avoid having to type the name twice and avoid possible typos in the string.

Baker · Post by **Baker** » Mon Jun 12, 2017 7:31 pm

ericw wrote:Have you checked out libffi Baker? This lets you call compiled code at runtime given the function name as a string.
https://en.wikipedia.org/wiki/Libffi

That could be helpful. I've actually already written up a C external library loader (I have long had multi-target rendering ... DX8/DX9/Open GL, not really much extra work). But I prefer to at a minimum examine a successful existing work and find out what train wrecks they had to deal with and how they adapted.

I don't like repeating mistakes that something else already dealt with.

Spike has plug-ins in FTE.

I can see external libraries or script capability being useful for things I don't care about, but that other people do.

Then I will say "Do it yourself."

And believe me --- I going to tell Gunter to make his own menu instead of complaining about mouse run/sidespeed.

@reckless -- yeah, I don't want to use a third party script library. That just means using someone else's limits and I wouldn't benefit from the experience of dealing with highly complexity issues hands-on.

Spike · Post by **Spike** » Tue Jun 13, 2017 2:00 am

just use menuqc.
implement the core builtins from https://github.com/xonotic/darkplaces/b ... enudefs.qc while skipping the all the network+font crap, while throwing in sprintf+the easier fte_strings builtins. hopefully autocvars too.
using DP's key constants can be annoying, although I suppose its not mandatory if everyone uses stringtokeynum at startup for the keys they care about.
A third implementation would convert menuqc from a mere argument into an actual standard.
Implementing someone else's standard means you don't have to bother documenting your own thing, and there's already a few existing implementations to test against (even if they need some tweaks to avoid unimplemented stuff at the time).
I'd implement it into qss, but I'm too lazy and I don't think anyone would actually care anyway.

Frankly though, the biggest issue is cvar names being randomized in pretty much every frikkin engine out there, so that's something for Gunter to have fun with (cvar_type(n)==0 means that the cvar is not defined, and thus he should try one of the other possible known names, which is equivalent to what I do with my menusys menu mod).

FTE's plugins tend to implement menus via fte's 'console windows' feature, combined with embedded console links/images for mouse stuff. Yes you can make proper menus in there, but with them being so isolated from the engine it makes the whole thing inconsistent and annoying, whereas consoles are a few lines to create them and then a load of prints. Easy, and with scrolling for free.

Baker · Post by **Baker** » Tue Jun 13, 2017 5:12 am

Spike wrote:just use menuqc.
implement the core builtins from https://github.com/xonotic/darkplaces/b ... enudefs.qc while skipping the all the network+font crap, while throwing in sprintf+the easier fte_strings builtins. hopefully autocvars too.

Had to do a coin flip between truth or dare. Came up heads, I had to honor the coin flip so here goes ...

1) The modders you think exist in 2017 --- they don't.

2) I'm not targeting modders. In fact, I believe Quake is a dead -- I expect no one will use the feature I am writing.

3) The idea of writing a full OOP language with reflection for something as dumb as a menu is amusing to me. It makes me laugh. I guess my sense of humor is bad

The capabilities are also far worse than I have alluded too. I suck

But my sense of humor drives me. It is both awesome and terrifying, so I try to not think about why I would do these things. "Hey look, a nickel!" (Baker distracts changing conversation ...)

4) While Quake is dead by any meaningful metric, it has 1 outstanding feature. That one outstanding feature can lead to a "reset".

5) It is hard to say exactly what happens after a "reset".

Anyway, my 2 cents. Everyone is worse off for reading this. Caveat emptor.

/One perspective. I'm often wrong about these kind of things. What do I know anyway, everyone has an opinion.

revelator · Post by **revelator** » Tue Jun 13, 2017 2:47 pm

Not dead but yeah it sure has lost some traction

If just for curiosity you could have a look at doom3's menu system, its pretty standard xml albeit written in C++, but it might give you some ideas.
cegui is opensource so you can modify it to your hearts content, but yeah adding a huge library just for a menu is kinda meh.
QT could also be used if going for something aking to a launcher, then again size will probably be a bit to much for so little benefit.
If going for a launcher type with no menu at all in quake itself you could use the fast light toolkit or FLTK in short, its probably one of the smallest applications for creating gui's out there and it spits out a single *.c when you have everything right in the editor

.
qcmenu is also a possiblity if you want to keep things straight in the quake world.

frag.machine · Post by **frag.machine** » Wed Jun 14, 2017 11:58 am

The advantage I see in menuqc is being useful not just for menus but for custom GUI projects in general: HUD's, NPC's dialogues, etc.

One idea I had in the past was a minimal markup language using notations to refer to global gameplay vars, cvars and entity fields in general. And instead calling directly C functions, just forwarding commands to the console. Easier said than done obviously. In the end I was introducing things like loops to control animations and conditional clauses when I decided to scrap it.

Baker · Post by **Baker** » Fri Jun 30, 2017 11:38 am

revelator wrote:If just for curiosity you could have a look at doom3's menu system, its pretty standard xml albeit written in C++, but it might give you some ideas.

I've been digging down and studying the Java virtual machine, the C# one and looking at assembly.

While it is not obvious, asm actually contains a shocking amount of abstraction if you look at the machine code.

mov ebp, esp is not the same code on the byte level as mov ebp, [esp + 4]. asm also --- by using registers which have a fixed size depending on the ones being accessed --- have an explicit size.

The actual opcodes written out, there are like 20 different types of add operations and such.

If you want multi-threading, you have to have multiple stacks and instruction pointers.

While I have no intention of doing what I believe Quake 3 may do --- true JIT compilation --- I need byte-code as an intermediary and actually must have 2 forms of it (because of file i/o and alignment variations depending on 32 bit or 64-bit

). Even if I am doing something else, which I am.

I know saner people will say look at library X or Y. But since I want to learn the deep, dark secrets and then use them --- this is the right way for me.

The important part is using them, I don't do anything unless I need the result. And this case, I definitely do.

Spike · Post by **Spike** » Sat Jul 01, 2017 12:43 am

JIT actually means that functions are converted to native machine code only when the function's actually executed. Q3 takes the lazy choice and does Ahead-Of-Time compilation instead. Basically just convert the entire thing at load time instead of run time. I've misused the term multiple times myself.

The thing thats interesting about both Q3 and Java(from what I remember of it) is that they both logically have TWO stacks, as it were. One stack holds the locals, the return addresses, etc, while the other stack holds the temporaries, the intermediates, whatever you want to call them.
so you have some load instruction with a single argument that reads data from somewhere and then just pushes it to the intermediates stack. followed by another similar instruction, then you have some simple 'mul' instruction without any arguments that pops the two arguments, multiplies them together, then pushes the result. this can then be followed by some store operation that shoves that data somewhere. When combined with a few rules like 'the intermediates stack must be empty before any jumps or comefroms', the AOT/JIT compiler is then free to allocate/remap whatever registers it likes for the various intermediates slots, and thus your simple 'mul' instruction can then become a simple x86 mul instruction (but with some extra magic each side to try to avoid needing extra movs to remap the registers to match x86's annoying specific-register requirements). If you're processing a function at a time then you can combine the two stacks back into a single block.

If you're generating bytecode then yes, x86 is horrible with all of its different extensions and additions over the years. arm bytecode 'should' be a little more straight forward, if only because it won't have quite so many restrictions on which register is used, nor different forms of instructions for different registers. I've not needed to deal with arm bytecode itself, and tbh I can never remember which terms are ins or outs with arm asm, but the bytecode itself should be more predictable, if I understand correctly.
Note that 'mov ebp,esp' has a totally different meaning from 'mov ebp,[esp]' so its hardly surprising that the bytecode changes to reflect the memory address instead of a register. Especially if your memory address is formed from multiple args like both a register AND an immediate, getting even more complex when you throw in a second register and a multiplier too... Handy though, you can get a lot of stuff done like that, hence the lea instruction which can be quite handy for multiply-and-add type stuff, especially when the regular mul instruction is so awkward.

On the other hand, QuakeC has logically has 65k+ different registers and no real indication about whether a variable is a local, an intermediate, or even a global, which makes it hard to keep track of their scopes, effectively resulting in all operations writing back into ram. That said, considering you would need to write an AOT compiler for 3 or 4 different instruction sets (even if only two main families), its easier to just write a simple interpreter, in which case the performance difference won't be that significant. Besides, your engine already has a QC interpreter...

If you have different intermediate bytecode for 32bit or 64bit builds, you're probably doing it wrong. I would argue that a vm does not need to store native pointers, thus you should not need any 64bit intermediate bytecode for your interpretter, and even if you did you could treat pointers differently (ie: always use 64bit bytecode), always reserve 64bits for them and just read+write them as 32bit on 32bit systems - this is effectively what 64bit processors do when running in compatibility mode anyway. Really though, any objects should be referenced via some table, thereby allowing you to orphan them safely or whatever, as well as free them at exit (even if you've no GC while actually running).
(obviously different ABIs need different native bytecode, but that doesn't mean that the intermediate stuff should differ, because if it does then what's the point of it at all? might as well just compile directly to the native bytecode)

frag.machine · Post by **frag.machine** » Sat Jul 01, 2017 1:43 pm

Besides JIT, at least as implemented by Java, would require Quake 3 to have an embedded profiler to detect the "hot spots" and only convert those parts to native instructions. Spike's explanation makes more sense to me.

Baker · Post by **Baker** » Tue Jul 18, 2017 1:32 am

Spike wrote:JIT actually means that functions are converted to native machine code only when the function's actually executed. Q3 takes the lazy choice and does Ahead-Of-Time compilation instead. Basically just convert the entire thing at load time instead of run time.

Ahead of time compilation, although not as "cool", probably works out better in most situations. But I suppose this also depends on how often an application is started/stopped and how long it tends to run.

Spike wrote:The thing thats interesting about both Q3 and Java(from what I remember of it) is that they both logically have TWO stacks, as it were. One stack holds the locals, the return addresses, etc, while the other stack holds the temporaries, the intermediates, whatever you want to call them.

From what I have read about Java, it tends to use the stack quite a bit for calculations. The docs imply that instead of feeding it registers, Java puts the operands on the stack and then the opcodes work only against the stack (presumably the top of the stack).

Anyway, I'll have to keep in what you said about 2 stacks and see if something clicks as to why that would be useful. Perhaps being JIT it is easier to have intermediate variables not changing the top of the stack during execution. Either way, sounds like something for me to look up sometime and read about.

Spike wrote:If you're generating bytecode then yes, x86 is horrible with all of its different extensions and additions over the years. arm bytecode 'should' be a little more straight forward, if only because it won't have quite so many restrictions on which register is used, nor different forms of instructions for different registers. I've not needed to deal with arm bytecode itself, and tbh I can never remember which terms are ins or outs with arm asm, but the bytecode itself should be more predictable, if I understand correctly.

According to what I have read, ARM originated in the 1980s so while not "new", obviously didn't have the legacy pressure of Intel so the instruction set would be cleaner. I've not much checked in on x64, mostly because Visual Studio lets me see the assembly language for C in 32-bit.

A quick trick to getting Visual Studio to not optimize is declaring variables volatile, which generates assembly language equivalents for any statement pretty much literally.

If you have different intermediate bytecode for 32bit or 64bit builds, you're probably doing it wrong.

I was doing it slightly wrong.

It has since been rectified and this comment helped me act on the feeling that "something wasn't quite right" about my initial approach.

This has been a lot of fun and very educational. It's a bit ironic realizing that external function calls have to be resolved at run-time, pretty much by name and you have to declare them in the bytecode, pretty much the same as C. (Unless one is using a "built-in" strategy where functions are locked to specific numbers.)

More or less like "LoadLibrary"/"FreeLibrary" or like the QuakeC extension system look-up function.

Baker · Post by **Baker** » Wed Aug 02, 2017 2:35 am

Funny the mind-bending problems something like this smacks you with.

You kept jacking up the specs and rewriting everything ...

Then suddenly you realize you have it running. And it scares you a little bit.

Sure there are 12 more complex things to do. But when you've done the hardest ones and planned for the other 12 along the way, it isn't that scary.

Next up: RAII

C should have had this handling file handles and sockets. It also makes sense that C Unix libraries treat sockets and file handles identically, just another fd (file descriptor).

Baker · Post by **Baker** » Wed Aug 09, 2017 4:22 am

The wiring this kind of thing involves is astounding and very boring, but puts fun computer science challenges in front of you.

Even with support for numerous datatypes in the opcodes, did a speed test and get 2,000,000+ byte code instructions per second.

Fundtion calls, which have to load up the stack on-the-fly (in reverse order, cdecl style

, can do about 200,000 byte-code function calls per second.

Ended up adding CTRL+BREAK support to "break" in debug mode. To my surprise, at least on Windows -- the message queue call is extremely efficient --- I call Win32 PeekMessage after every operation (I was hoping to catch a signal, but that is only possibly in a console application

) and the penalty is very low.

Baker · Post by **Baker** » Sun Aug 27, 2017 4:22 am

Ghost town update, haha

Constructing IDE. Mostly to test setting breakpoints and then "edit and continue".

With edit and continue, I'm trying to sort out a logical way to determine what code has changed.

I think as a matter of low common denominator it can only be code inside a single function that changed, so I plan to scope it that way.

The virtual machine stuff itself, the most fun was writing code for shortcircuiting AND and OR. They aren't really proper operators and have to be designed to skip potentially big blocks of code.

InsideQC Forums

Calling Conventions/Stack

Calling Conventions/Stack

Re: Calling Conventions/Stack

Re: Calling Conventions/Stack

Re: Calling Conventions/Stack

Re: Calling Conventions/Stack

Re: Calling Conventions/Stack

Re: Calling Conventions/Stack

Re: Calling Conventions/Stack

Re: Calling Conventions/Stack

Re: Calling Conventions/Stack

Re: Calling Conventions/Stack

Re: Calling Conventions/Stack

Re: Calling Conventions/Stack

Re: Calling Conventions/Stack

Re: Calling Conventions/Stack