the vanilla software renderer had a strict 1:16 ratio of lightmap:texels (for mip level 0, which also means all wall textures must be a multiple of 16). This ratio persists into glquake despite glquake using no surface cache. The file format basically says absolutely nothing about lightmap coords or sizes, they're instead inferred by rounding the regular texture coords and figuring out the extents from that, with floats which are somewhat imprecise and glitches out if you're not really careful (x87 maths is often performed using 80 bits even for 32bit floats, which can be a problem when it gets ported to other cpus/compilers). The light tool and the engine should be using the same maths, and thus should both calculate the same extents for the surface's lightmaps and thus the same lightmap width+height.
texture extents limited to (16+1) in each axis. the +1 allows for interpolation on the side. for some reason the limit was bumped by 1 in glquake, probably to try to hide precision issues. This means that the qbsp MUST subdivide each surface into a maximum of 256*256 texel blocks (larger map textures will just result in two surfaces instead).
Some engines increase the maximum lightmap size to 256*256, which means surfaces can get subdivided to up to 4080 (256*16-16) texels max (which of course crashes other engines). Note that surface subdivision happens in the qbsp util, not the light util. Also note that many surface (read: sky and turb/water) are not lightmapped, and thus do not need to be subdivided by qbsp. Note that even the vanilla qbsp tool had a commandline argument to control the max post-subdivision size.
lit2 has per-surface scales, while bspx has an optional lightmap-scale lump that overrides the 1:16 ratio in engines that recognise it, but support for these is pretty much limited to just fte + tyrutils-ericw, and even then its not well tested and disabled by default.
There's /16 and >>4 in a few different places inside the glquake code.
You can compile the map with -extra and the light util will instead calculate 4 points per luxel instead of 1. It'll then average them so as to not violate the file format. The result is smoother lighting.
Threading the lighting shouldn't be too complicated. About the only thing you'll need mutexes for is allocating output file space and figuring out which surface to light next. In fact, you could probably get it all done with atomic_fetch_and_add without any mutexes. Just avoid using globals.