Engoo
Moderator: InsideQC Admins
Re: Engoo
leileilol wrote:Can this possibly get any faster? (the smallest mip function is shown here for size reasons)
- Code: Select all
void R_DrawSurfaceBlock8RGBXD_mip3 ()
{
int v, i;
int lightstep[3],light[3];
int lightdelta[3], lightdeltastep[3];
int r;
unsigned char pix, *psource;
unsigned short *prowdest;
unsigned char *pix24;
int trans[3];
psource = pbasesource;
prowdest = (unsigned short *)prowdestbase;
for (v=0 ; v<r_numvblocks ; v++)
{
lightlefta[0] = r_lightptr[0];
lightrighta[0] = r_lightptr[3];
lightlefta[1] = r_lightptr[0+1];
lightrighta[1] = r_lightptr[3+1];
lightlefta[2] = r_lightptr[0+2];
lightrighta[2] = r_lightptr[3+2];
lightdelta[0] = (lightlefta[0] - lightrighta[0]) >> 1;
lightdelta[1] = (lightlefta[1] - lightrighta[1]) >> 1;
lightdelta[2] = (lightlefta[2] - lightrighta[2]) >> 1;
r_lightptr += r_lightwidth * 3;
lightleftstepa[0] = (r_lightptr[0] - lightlefta[0]) >> 1;
lightrightstepa[0] = (r_lightptr[3] - lightrighta[0]) >> 1;
lightleftstepa[1] = (r_lightptr[0+1] - lightlefta[1]) >> 1;
lightrightstepa[1] = (r_lightptr[3+1] - lightrighta[1]) >> 1;
lightleftstepa[2] = (r_lightptr[0+2] - lightlefta[2]) >> 1;
lightrightstepa[2] = (r_lightptr[3+2] - lightrighta[2]) >> 1;
lightdeltastep[0] = (lightleftstepa[0] - lightrightstepa[0]) >> 1;
lightdeltastep[1] = (lightleftstepa[1] - lightrightstepa[1]) >> 1;
lightdeltastep[2] = (lightleftstepa[2] - lightrightstepa[2]) >> 1;
for (i=0 ; i<2 ; i++)
{
light[0] = lightrighta[0]; light[1] = lightrighta[1]; light[2] = lightrighta[2];
if (psource[1] < host_fullbrights) { pix = psource[1]; pix24 = (unsigned char *)&d_8to24table[pix]; trans[0] = ((int)pix24[0] * light[0]) >> 18; trans[1] = ((int)pix24[1] * light[1]) >> 18; trans[2] = ((int)pix24[2] * light[2]) >> 18; if (trans[0] < 0) trans[0] = 0; if (trans[1] < 0) trans[1] = 0; if (trans[2] < 0) trans[2] = 0; if (trans[0] > 31) trans[0] = 31; if (trans[1] > 31) trans[1] = 31; if (trans[2] > 31) trans[2] = 31; prowdest[1] = (trans[0] << 10) | (trans[1] << 5) | trans[2]; }else{
pix = psource[1]; pix24 = (unsigned char *)&d_8to24table[pix]; trans[0] = ((int)pix24[0]) >> 3; trans[1] = ((int)pix24[1]) >> 3; trans[2] = ((int)pix24[2]) >> 3; prowdest[1] = (trans[0] << 10) | (trans[1] << 5) | trans[2]; } light[0] += lightdelta[0]; light[1] += lightdelta[1]; light[2] += lightdelta[2];
if (psource[0] < host_fullbrights) { pix = psource[0]; pix24 = (unsigned char *)&d_8to24table[pix]; trans[0] = ((int)pix24[0] * light[0]) >> 18; trans[1] = ((int)pix24[1] * light[1]) >> 18; trans[2] = ((int)pix24[2] * light[2]) >> 18; if (trans[0] < 0) trans[0] = 0; if (trans[1] < 0) trans[1] = 0; if (trans[2] < 0) trans[2] = 0; if (trans[0] > 31) trans[0] = 31; if (trans[1] > 31) trans[1] = 31; if (trans[2] > 31) trans[2] = 31; prowdest[0] = (trans[0] << 10) | (trans[1] << 5) | trans[2]; }else{
pix = psource[0]; pix24 = (unsigned char *)&d_8to24table[pix]; trans[0] = ((int)pix24[0]) >> 3; trans[1] = ((int)pix24[1]) >> 3; trans[2] = ((int)pix24[2]) >> 3; prowdest[0] = (trans[0] << 10) | (trans[1] << 5) | trans[2]; } light[0] += lightdelta[0]; light[1] += lightdelta[1]; light[2] += lightdelta[2];
psource += sourcetstep;
lightrighta[0] += lightrightstepa[0];lightlefta[0] += lightleftstepa[0];lightdelta[0] += lightdeltastep[0];
lightrighta[1] += lightrightstepa[1];lightlefta[1] += lightleftstepa[1];lightdelta[1] += lightdeltastep[1];
lightrighta[2] += lightrightstepa[2];lightlefta[2] += lightleftstepa[2];lightdelta[2] += lightdeltastep[2];
prowdest += surfrowbytes;
}
if (psource >= r_sourcemax)
psource -= r_stepback;
}
}
There's too much typecasting going on.
Converting the textures to 24-bit on load would reduce the pix = psource[1]; pix24 = (unsigned char *)&d_8to24table[pix]; statements, but converting them to 96 bytes would also eliminate the typecasting on the trans[0] = ((int)pix24[0] * light[0]) >> 18; statements. Of course, this requires much more memory.
You can also eliminate the if (psource[1] < host_fullbrights) on textures without fullbright pixels by checking for fullbright pixels on the textures during load time, and using a different surface block drawing functions set for such textures.
As for the clamping, you could at least lump the if (trans[0] < 0) trans[0] = 0; and if (trans[0] > 31) trans[0] = 31; statements together, and add an else to it.
-

mankrip - Posts: 915
- Joined: Fri Jul 04, 2008 3:02 am
Re: Engoo
How much could RGB color and map textures be crunched into fewer bytes and still look decent when the output is 8 bit? 12 bit color may be as good as 24bit. Can it get low enough to do more indexing with target memory range? 32mb DOS maybe no. 512mb XP maybe yes.
-
qbism - Posts: 1236
- Joined: Thu Nov 04, 2004 5:51 am
Re: Engoo
qbism wrote: 12 bit color may be as good as 24bit.
No it won't. You're forgetting all of Quake's dark color palette entries. Going 12 or even 15 would eliminate them (which is why I did away with the 15bpp version of the surfaceblocks)
I could try the 32-bit conversion method, because it would also allow the dithered drawspans to work faster, and the alpha channel of the texels could determine if it's fullbright, with a simple if (pix[3]) check rather than a greater/lessthan of a host_fullbrights (which is an integer containing the range of fullbright colors as read from the last byte of the colormap lump, or as calculated)
i should not be here
- leileilol
- Posts: 2783
- Joined: Fri Oct 15, 2004 3:23 am
Re: Engoo
If I was going to do this, the way I'd do it would be to make 3 separate colormaps - one for each of R, G and B, then output to 3 separate color buffers - one again for each of R, G and B. Mix them together at swapbuffers time to get the final image.
I have no idea what the performance would be like, but the main advantages I see would be that you get to keep the original ASM code (with all of it's wacky Abrash optimizations) because you're writing to 8-bit targets, and various blends/colour shifts/palette hackery/etc can be done more-or-less for free because they can be applied during the final mix.
I have no idea what the performance would be like, but the main advantages I see would be that you get to keep the original ASM code (with all of it's wacky Abrash optimizations) because you're writing to 8-bit targets, and various blends/colour shifts/palette hackery/etc can be done more-or-less for free because they can be applied during the final mix.
We had the power, we had the space, we had a sense of time and place
We knew the words, we knew the score, we knew what we were fighting for
We knew the words, we knew the score, we knew what we were fighting for
-

mh - Posts: 2292
- Joined: Sat Jan 12, 2008 1:38 am
Re: Engoo
so basically swapping around r_lightptr with the other colors, swapping the colormap between blocks and back? Wouldn't that still bring overhead?
That IS an interesting idea though. The r g b colormaps would lack overbrights, and the fullbright colors would simply have their components subtracted (so they add together. Yay free fullbrights again), and they won't be remapped to the host palette, and since the colormaps have 64 rows this allows for the same 18-bit color precision i've already been using
Wouldn't it also be faster to have another surfaceblock function just to simply run a palmap lookup instead of performing it on the screen buffer?
A surfaceblock would look like this for example
Or the original asm functions could just be duplicated three times and use different lightptr and colormap names. Right now i'm struggling just trying to get the red channel to show up as normal lighting in the original function (which would be a great start to this idea). If I did this in 2010 I would've figured it out in the coding rush immediately. Probably need to hack up a new BuildLightmap function, or throw in a multiply in the assembly files
That IS an interesting idea though. The r g b colormaps would lack overbrights, and the fullbright colors would simply have their components subtracted (so they add together. Yay free fullbrights again), and they won't be remapped to the host palette, and since the colormaps have 64 rows this allows for the same 18-bit color precision i've already been using
Wouldn't it also be faster to have another surfaceblock function just to simply run a palmap lookup instead of performing it on the screen buffer?
A surfaceblock would look like this for example
- Code: Select all
void R_DrawSurfaceBlock8RGBX_mip0 (void)
{
r_lightptr = r_lightptr_r;
vid.colormap = vid.colormap_r;
R_DrawSurfaceBlock8_mip0();
r_lightptr = r_lightptr_g;
vid.colormap = vid.colormap_g;
R_DrawSurfaceBlock8_mip0();
r_lightptr = r_lightptr_b;
vid.colormap = vid.colormap_b;
R_DrawSurfaceBlock8_mip0();
vid.colormap = vid.colormap_old;
R_DrawSurfaceBlockRGBTO8();
}
Or the original asm functions could just be duplicated three times and use different lightptr and colormap names. Right now i'm struggling just trying to get the red channel to show up as normal lighting in the original function (which would be a great start to this idea). If I did this in 2010 I would've figured it out in the coding rush immediately. Probably need to hack up a new BuildLightmap function, or throw in a multiply in the assembly files
i should not be here
- leileilol
- Posts: 2783
- Joined: Fri Oct 15, 2004 3:23 am
Re: Engoo
It shouldn't be a problem with fullbrights at least; you'd have 3 colormap tables (you really only need one though) each of 256x64 bytes and each translating a 0..255 base to it's lit equivalent, so you could maintain fullbright columns in the table.
We had the power, we had the space, we had a sense of time and place
We knew the words, we knew the score, we knew what we were fighting for
We knew the words, we knew the score, we knew what we were fighting for
-

mh - Posts: 2292
- Joined: Sat Jan 12, 2008 1:38 am
Re: Engoo
leileilol wrote:qbism wrote: 12 bit color may be as good as 24bit.
No it won't. You're forgetting all of Quake's dark color palette entries. Going 12 or even 15 would eliminate them (which is why I did away with the 15bpp version of the surfaceblocks)
I did forget, but color cheating is allowed. 12bpp has a huge advantage over 15bpp. It's small enough for reasonably sized lookup tables. 12bit x 12bit = 16mb each. The Quakey colors can be normalized into the colorspace via a translation table to give those dark brown shades more area, and then translated back down in the end back to 8bit via the same table.
This particular scheme may not pan-out, but if a 256mb+ RAM footprint is acceptable then it opens lookup table options. I have a hard time giving up on single-pass R_DrawSurfaceBlock8_mip. Such a thing with nice color blending is the Holy Grail. The best 3x option is still 3x slower.
-
qbism - Posts: 1236
- Joined: Thu Nov 04, 2004 5:51 am
Re: Engoo
I actually have the 3 blocks thing half-implemented (generating the 3 color maps as theorized even). I have to swap it in the blockdrawer and call the block drawer thrice.
HOWEVER what I have a problem with is lightwidth. How the hell do I add that necessary " * 3" in the ASM?
Also, merging surfrowbytes, or prowdest? Sometimes i'm confused where it's writing to.
HOWEVER what I have a problem with is lightwidth. How the hell do I add that necessary " * 3" in the ASM?
Also, merging surfrowbytes, or prowdest? Sometimes i'm confused where it's writing to.
i should not be here
- leileilol
- Posts: 2783
- Joined: Fri Oct 15, 2004 3:23 am
Re: Engoo
lea eax, [eax + 2*eax] (or something very similar, I might have the syntax wrong). iirc, it's 1 clock cycle (or was on a 486).
that's probably the fastest 3 * you can get in asm (though using something other than eax as the dest might help, I don't know).
that's probably the fastest 3 * you can get in asm (though using something other than eax as the dest might help, I don't know).
Leave others their otherness.
http://quakeforge.net/
http://quakeforge.net/
- taniwha
- Posts: 399
- Joined: Thu Jan 14, 2010 7:11 am
Re: Engoo
leal (%ecx + 2*%ecx),%ecx // works
anyhow i have committed the experimental mess for you to laugh, i mean look at. Trying to make new prowdests with the separate components is easier said than done (and it's not done
)
anyhow i have committed the experimental mess for you to laugh, i mean look at. Trying to make new prowdests with the separate components is easier said than done (and it's not done
i should not be here
- leileilol
- Posts: 2783
- Joined: Fri Oct 15, 2004 3:23 am
Who is online
Users browsing this forum: No registered users and 1 guest


