qbismSuper8 builds
Moderator: InsideQC Admins
Re: qbismSuper8 builds
Clang uses a syntax similar to gcc so should not be to hard
the codelite editor supports clang so maybe a good choice if you want to experiment.
My somewhat dated codeblocks editor does not but i heard that it was added to later versions, well should give me an excuse to update my version.
Only downside is that i probably have to hack the current versions sources to support cb::advanced (again erf) cause last time the support for the MinGW64 compiler was kinda shoddy.
Can atleast get by most things with gcc if you take the time to learn the compiler switches
-mtune=generic will optimize for old 386 processor instructions which should work on all architectures even AMD.
But it can probably be felt on the speed.
My somewhat dated codeblocks editor does not but i heard that it was added to later versions, well should give me an excuse to update my version.
Only downside is that i probably have to hack the current versions sources to support cb::advanced (again erf) cause last time the support for the MinGW64 compiler was kinda shoddy.
Can atleast get by most things with gcc if you take the time to learn the compiler switches
But it can probably be felt on the speed.
Productivity is a state of mind.
-

revelator - Posts: 2567
- Joined: Thu Jan 24, 2008 12:04 pm
- Location: inside tha debugger
Re: qbismSuper8 builds
The easiest fix to try for Super8 would be going back to gcc 4.5 or 4.6. But it's not a good long-term solution. I'm also working on a Watcom project based on Levent's port of an earlier version. Watcom output will definitely run on XP.
Ironically VS2013 produces great XP output, and the IDE is enjoyable to use.
I don't know much about Clang... Does it support old Windows versions?
Ironically VS2013 produces great XP output, and the IDE is enjoyable to use.
I don't know much about Clang... Does it support old Windows versions?
-
qbism - Posts: 1236
- Joined: Thu Nov 04, 2004 5:51 am
Re: qbismSuper8 builds
Think so but not 100 on it it has a rather good static source scanner though which could help track mistakes.
Msvc 2013 is quite good aye its just a bit bloated by all the win 8 support tools but works beautifully.
Watcom has full support of XP if you need directx support its better to get the directx sdk though, watcoms built in replacements are pretty limited,
it does link fine with msvc libraries though so thats a plus. SDL also works with watcom.
Msvc 2013 is quite good aye its just a bit bloated by all the win 8 support tools but works beautifully.
Watcom has full support of XP if you need directx support its better to get the directx sdk though, watcoms built in replacements are pretty limited,
it does link fine with msvc libraries though so thats a plus. SDL also works with watcom.
Productivity is a state of mind.
-

revelator - Posts: 2567
- Joined: Thu Jan 24, 2008 12:04 pm
- Location: inside tha debugger
Re: qbismSuper8 builds
Build 190 bugfix release plus Open Watcom project file. http://super8.qbism.com
This build is intended to fix problems running on Windows XP and waterwarp crash on some CPUs. XP issue was fixed by switching to the TDM-GCC compiler. Waterwarp was a spot where vid.width was used when it should have been vid.rowbytes. Vid.rowbytes can be negative (always?) thanks to the wonderful upside-down DIB implementation of Windows.
A project file for compiling on Open Watcom is up on the svn. An exe is provided in the download. Thanks to Levent for the project file and the type casts and other corrections to Build 168. These were migrated to the current build.
The release removes the special XP build because the main qbismS8.exe should run on XP. The icore build is also removed because the flavor of gcc used provides no speed gain (on a particular icore5 laptop).
There's a small potential speed gain created by assigning a static local variable to the extern cachewidth. This way a register is used rather than looking up the variable each loop. Here's a comparison of the original asm loop and the current compiled loop-
In d_draw16.s, find the guts of D_DrawSpans16 right after
It's a repeating sequence
versus
The original asm loop has fewer steps and no imul, but cycles are spent ahead of time to create the 'advancetable':
Ideally this could be implemented in C for a comparison. Have to figure out how it works first.
This build is intended to fix problems running on Windows XP and waterwarp crash on some CPUs. XP issue was fixed by switching to the TDM-GCC compiler. Waterwarp was a spot where vid.width was used when it should have been vid.rowbytes. Vid.rowbytes can be negative (always?) thanks to the wonderful upside-down DIB implementation of Windows.
A project file for compiling on Open Watcom is up on the svn. An exe is provided in the download. Thanks to Levent for the project file and the type casts and other corrections to Build 168. These were migrated to the current build.
The release removes the special XP build because the main qbismS8.exe should run on XP. The icore build is also removed because the flavor of gcc used provides no speed gain (on a particular icore5 laptop).
There's a small potential speed gain created by assigning a static local variable to the extern cachewidth. This way a register is used rather than looking up the variable each loop. Here's a comparison of the original asm loop and the current compiled loop-
In d_draw16.s, find the guts of D_DrawSpans16 right after
- Code: Select all
adcl advancetable+4(,%ecx,4),%esi // point to next source texel
It's a repeating sequence
- Code: Select all
addl tstep,%edx
sbbl %ecx,%ecx
movb (%esi),%al
addl %ebp,%ebx
movb %al,7(%edi)
adcl advancetable+4(,%ecx,4),%esi
versus
- Code: Select all
imull %edi, %eax
movl %edx, %edi
addl 4(%esp), %eax
sarl $16, %edi
addl %ebx, %edx
movb (%eax,%edi), %al
movl (%esp), %edi
movb %al, 7(%ecx)
movl %esi, %eax
sarl $16, %eax
addl %ebp, %esi
The original asm loop has fewer steps and no imul, but cycles are spent ahead of time to create the 'advancetable':
- Code: Select all
//
// set up advancetable
//
movl %ecx,%eax
movl %ebp,%edx
sarl $20,%eax // tstep >>= 16;
jz LZero
sarl $20,%edx // sstep >>= 16;
movl C(cachewidth),%ebx
imull %ebx,%eax
jmp LSetUp1
LZero:
sarl $20,%edx // sstep >>= 16;
movl C(cachewidth),%ebx
LSetUp1:
addl %edx,%eax // add in sstep
// (tstep >> 16) * cachewidth + (sstep >> 16);
movl tfracf,%edx
movl %eax,advancetable+4 // advance base in t
addl %ebx,%eax // ((tstep >> 16) + 1) * cachewidth +
// (sstep >> 16);
shll $12,%ebp // left-justify sstep fractional part
movl sfracf,%ebx
shll $12,%ecx // left-justify tstep fractional part
movl %eax,advancetable // advance extra in t
movl %ecx,tstep
addl %ecx,%edx // advance tfrac fractional part by tstep frac
sbbl %ecx,%ecx // turn tstep carry into -1 (0 if none)
addl %ebp,%ebx // advance sfrac fractional part by sstep frac
adcl advancetable+4(,%ecx,4),%esi // point to next source texel
Ideally this could be implemented in C for a comparison. Have to figure out how it works first.
-
qbism - Posts: 1236
- Joined: Thu Nov 04, 2004 5:51 am
Re: qbismSuper8 builds
qbism wrote:Waterwarp was a spot where vid.width was used when it should have been vid.rowbytes.
Is your waterwarp code derived from the one in the latest version of Makaqu? I can't check my code right now.
-

mankrip - Posts: 915
- Joined: Fri Jul 04, 2008 3:02 am
Re: qbismSuper8 builds
I think so, it's the version with scalable warp, uwarpscale and vwarpscale.mankrip wrote:Is your waterwarp code derived from the one in the latest version of Makaqu? I can't check my code right now.
Unfortunately I think I committed the wrong iteration of the warp code I was experimenting on.
The trick is using vid.rowbytes when dealing with the actual vid.buffer, and vid.width for the warpbuffer. vid.buffer pointer will point to the end of the memory when vid.rowbytes is negative (upside-down DIB is typical). So loops start at the end and work backward.
The SVN is
- Code: Select all
src = vid.buffer + scr_vrect.y * vid.rowbytes + scr_vrect.x;
dest= warpbuf + scr_vrect.y * vid.rowbytes + scr_vrect.x;
But probably should be
- Code: Select all
src = vid.buffer + scr_vrect.y * vid.rowbytes + scr_vrect.x;
dest= warpbuf + scr_vrect.y * vid.width+ scr_vrect.x;
This situation occurs in the main function, the threaded loop, and the transfer back to vid.buffer. It's like working with double-negatives. Hopefully will have time to review it again within a day or two.
-
qbism - Posts: 1236
- Joined: Thu Nov 04, 2004 5:51 am
Re: qbismSuper8 builds
With a stock install vid_ddraw 1 is still required to eliminate the crash while in the water/portals on AMD Phenom II systems. I'll give a Core 2 Duo system a test later on today. As before changing r_wateralpha to anything other than the default eliminates it. Something is for sure improved since vid_ddraw 1 still crashed on Phenom II systems on the last build if r_wateralpha wasn't also adjusted.
On my AMD Geode (500 MHz of raw power) system the regular build crashes immediately but the watcom one runs as intended.
On my AMD Geode (500 MHz of raw power) system the regular build crashes immediately but the watcom one runs as intended.
- cubanraul
- Posts: 9
- Joined: Tue Jun 24, 2014 9:38 pm
Re: qbismSuper8 builds
The r_wateralpha thing is what really drives me nuts because the waterwarp function does NOTHING with alpha! The scene is already rendered at that point. I may be barking up the wrong tree... the next thing I will try is reverting back to a non-threaded version of all functions (waterwarp, flipscreen, and fog). The debugger is only telling me 'segfault' for crashes but it's always within a thread when it happens. Threading performance gain is minimal anyway. Plus engoo fogspans is faster and would eliminate the need for flipscreen and fog loops... really should migrate to that.cubanraul wrote:With a stock install vid_ddraw 1 is still required to eliminate the crash while in the water/portals on AMD Phenom II systems. I'll give a Core 2 Duo system a test later on today. As before changing r_wateralpha to anything other than the default eliminates it. Something is for sure improved since vid_ddraw 1 still crashed on Phenom II systems on the last build if r_wateralpha wasn't also adjusted.
On my AMD Geode (500 MHz of raw power) system the regular build crashes immediately but the watcom one runs as intended.
Another reversion that will likely happen is return of many asm functions. I 'cheated' drawspans by simply removing the multiplication in the loop (without actually solving advance table) and it is still slower than asm.
-
qbism - Posts: 1236
- Joined: Thu Nov 04, 2004 5:51 am
Re: qbismSuper8 builds
On Core 2 Duo and Core 2 Quad systems the crash is now fixed. Can anyone else replicate this crash on anything and if so, on anything other than a Phenom II?
- cubanraul
- Posts: 9
- Joined: Tue Jun 24, 2014 9:38 pm
Re: qbismSuper8 builds
Not home atm but ill run it through a corei7 and an old AMD x2 when i get back can also try it out on a really old P4 machine i use for retro gaming.
edit: debugging threaded functions can be a real pain in the ... especially when one thread zombies out and takes the whole shebang down with it before the debugger can attach itself :S.
edit: debugging threaded functions can be a real pain in the ... especially when one thread zombies out and takes the whole shebang down with it before the debugger can attach itself :S.
Productivity is a state of mind.
-

revelator - Posts: 2567
- Joined: Thu Jan 24, 2008 12:04 pm
- Location: inside tha debugger
Re: qbismSuper8 builds
I appreciate additional machine type tests
Current status- GDI video.rowbytes mix-up seems to be solved. Threaded warp code is removed. Not sure if it was part of the problem, but there was no speed benefit on test systems up to an icore5. Icore7 could be tested by setting thread cvar between 1 and 8. Hopefully will get a new build up this weekend.
Current status- GDI video.rowbytes mix-up seems to be solved. Threaded warp code is removed. Not sure if it was part of the problem, but there was no speed benefit on test systems up to an icore5. Icore7 could be tested by setting thread cvar between 1 and 8. Hopefully will get a new build up this weekend.
-
qbism - Posts: 1236
- Joined: Thu Nov 04, 2004 5:51 am
Re: qbismSuper8 builds
Ok tested on the machines i have and no problem this side (though i use a newer build of open watcom for compiling).
Open Watcom 2.0 still in beta but there are daily changes on github so bugs might be fixed that could have caused crashes before.
Open Watcom 2.0 still in beta but there are daily changes on github so bugs might be fixed that could have caused crashes before.
Productivity is a state of mind.
-

revelator - Posts: 2567
- Joined: Thu Jan 24, 2008 12:04 pm
- Location: inside tha debugger
Re: qbismSuper8 builds
qbism wrote:I appreciate additional machine type tests![]()
Current status- GDI video.rowbytes mix-up seems to be solved. Threaded warp code is removed. Not sure if it was part of the problem, but there was no speed benefit on test systems up to an icore5. Icore7 could be tested by setting thread cvar between 1 and 8. Hopefully will get a new build up this weekend.
I took a shot in the dark and tried something that didn't work in the last version (due to the other bug fixed in this build): Set compatibility to Windows 98 and it stopped crashing on my Phenom II system. So, it does look like the threading was the issue on these CPUs. I remember some older Unreal engine based games need a similar setting to avoid the engine from going bonkers and crashing all the time (in purely technical terms of course).
EDIT: ugh, disregard. I forgot that my current water opacity setting gets reset between crashes so it was not at 0.5 at the time of the test with the Win 98 setting.
- cubanraul
- Posts: 9
- Joined: Tue Jun 24, 2014 9:38 pm
Re: qbismSuper8 builds
Build 193 is up. It should be an improvement. I can still replicate this crash... but only at one resolution setting! It occurs in both TDM-GCC and Watcom builds so it's not a compiler quirk. I've found that it happens even at a different initial r_wateralpha setting. The key is to change it before jumping in the water.
Of course this all goes away by setting vid_ddraw cvar to 1. Fascinating.
Threads are gone, asm is back, and fog is faster. It's still the same post-process fog with some clean-up, not drawspan fog.
Of course this all goes away by setting vid_ddraw cvar to 1. Fascinating.
Threads are gone, asm is back, and fog is faster. It's still the same post-process fog with some clean-up, not drawspan fog.
-
qbism - Posts: 1236
- Joined: Thu Nov 04, 2004 5:51 am
Who is online
Users browsing this forum: No registered users and 1 guest