Page 2 of 2

Re: Optimizing

Posted: Sun May 06, 2012 4:03 pm
by Knightmare
I finally got it working properly. There was a problem with glTexSubImage2D, where it wouldn't wrap to the starting rectangle-specified column of each row in the source buffer, screwing up the lightmap textures. I ended up uploading the entire width of each changed area.

I had to change this:

Code: Select all

qglTexSubImage2D (GL_TEXTURE_2D, 0,
		gl_lms.lightrect[i].left, gl_lms.lightrect[i].top, 
		(gl_lms.lightrect[i].right - gl_lms.lightrect[i].left), (gl_lms.lightrect[i].bottom - gl_lms.lightrect[i].top), 
		GL_LIGHTMAP_FORMAT, GL_UNSIGNED_BYTE,
		gl_lms.lightmap_update[i] + ((gl_lms.lightrect[i].top * LM_BLOCK_WIDTH + gl_lms.lightrect[i].left) * LIGHTMAP_BYTES));
To this:

Code: Select all

// update full width of lm texture, because qglTexSubImage2D doesn't wrap around to starting column
qglTexSubImage2D (GL_TEXTURE_2D, 0,
		0, gl_lms.lightrect[i].top, 
		LM_BLOCK_WIDTH, (gl_lms.lightrect[i].bottom - gl_lms.lightrect[i].top), 
		GL_LIGHTMAP_FORMAT, GL_UNSIGNED_BYTE,
		gl_lms.lightmap_update[i] + ((gl_lms.lightrect[i].top * LM_BLOCK_WIDTH) * LIGHTMAP_BYTES));
There's a bit of a speedup over just batching non-lightmapped surfaces. This is on an nVidia card, so the difference will probably be greater on ATI/AMD.

Re: Optimizing

Posted: Sun May 06, 2012 6:24 pm
by mh

Code: Select all

glPixelStorei (GL_UNPACK_ROW_LENGTH, LM_BLOCK_WIDTH);
// bunch of glTexSubimage calls
glPixelStorei (GL_UNPACK_ROW_LENGTH, 0);
;)

Instead of GL_LIGHTMAP_FORMAT, GL_UNSIGNED_BYTE try GL_BGRA, GL_UNSIGNED_INT_8_8_8_8_REV and modify R_BuildLightmap to match. You should see it soar after that.

Re: Optimizing

Posted: Sun May 06, 2012 8:12 pm
by Knightmare
D'oh, missed that! Got it working now. Thanks.

GL_LIGHTMAP_FORMAT is already set to GL_BRRA. I changed from GL_UNSIGNED_BYTE to GL_UNSIGNED_INT_8_8_8_8_REV, but there was not noticeable improvement.

Re: Optimizing

Posted: Sun May 06, 2012 10:56 pm
by mh
Yeah, it's much the same on NVIDIA. Where it really flies it with AMD and - especially - Intel. All the same, it does suggest that you still have something of a bottleneck somewhere else. :?:

Re: Optimizing

Posted: Sat May 12, 2012 12:43 am
by Knightmare
This is on a GTX 285, so I guess there shouldn't be any visible difference.

As an alternative to using texture arrays for lightmaps, would increasing the block size from 128x128 to 256x256 be a good idea? This would increase the batchability of surfaces by having fewer with different lightmap images, and thus need fewer rendering calls, but it might increase the bandwidth used for updating the lightmaps.

Re: Optimizing

Posted: Sat May 12, 2012 12:49 am
by leileilol
phew

good thing the engine i'm concerned with when I started this thread doesn't even update lightmaps


Is there any performance worries if I switch my muzzleflashes to a pair of autosprites, one being an autosprite2, instead of a standard polygon blob/half-circle flash?

Re: Optimizing

Posted: Mon Jul 02, 2012 8:38 pm
by r00k
Ugh, my head hurts.
I tried converting the lightmap updates to GL_BGRA and GL_UNSIGNED_INT_8_8_8_8_REV, still doing something wrong.
changed the order of the colors, added an alpha channel to the lightmap array, multiply by 4 instead of 3 etc
I still get garbled rainbow lights. The only evidence i was onthe right track was once i had muzzleflashes working... ugh!
Back to square one and attempt again.. :/

Re: Optimizing

Posted: Mon Jul 02, 2012 10:30 pm
by mh
r00k wrote:Ugh, my head hurts.
I tried converting the lightmap updates to GL_BGRA and GL_UNSIGNED_INT_8_8_8_8_REV, still doing something wrong.
changed the order of the colors, added an alpha channel to the lightmap array, multiply by 4 instead of 3 etc
I still get garbled rainbow lights. The only evidence i was onthe right track was once i had muzzleflashes working... ugh!
Back to square one and attempt again.. :/
You're possibly either missing an *dest++ = 255 or dest += 4 in R_BuildLightmap. Alternatively the stride may be still multiplied by 3. Don't add size * 4 to lightmap on each pass through the loop, don't add i * 4 to loadmodel->lightdata in Mod_LoadFaces - use 3 here. So the only places where you have to be careful to use 4 are dest, stride and lightmap_bytes (especially important for your Tex(Sub)Image calls); keep 3 everywhere else - blocklights, lightmap, surf->samples (and if you're using lightmap_bytes on one of those you should revert to 3 there too).

Re: Optimizing

Posted: Tue Jul 03, 2012 2:27 am
by r00k
K thanks,
i saw unsigned blocklights[MAX_LIGHTMAP_SIZE*3];
and this
byte lightmaps[4*MAX_LIGHTMAPS*LIGHTMAP_BLOCK_WIDTH*LIGHTMAP_BLOCK_HEIGHT];

i which confused me. block lights kinda tie into stride but it gets all fuzzy...

Re: Optimizing

Posted: Tue Jul 03, 2012 7:56 am
by Spike
its only the lightmaps array that needs to be *4.
and the only bit of code that needs to care what format the hardware is to be fed is the code that copies+downshifts from blocklights into lightmaps and the glTexSubImage call.
the rest should generally be *3 if you've got lit support, and *1 otherwise.

Re: Optimizing

Posted: Tue Jul 03, 2012 4:45 pm
by r00k
Ack wait stop the presses!
In debug mode it finally works, select release rebuild the flashes are green and the lits are purple ugh! :P


LeiLei i would think there would be better performance in autosprites (2d) vs muzzleflash polygon orb. Though sometimes billboard/sprite muzzle flashes tend to lag a bit behind as the player moves