Forum

Short file checksum/hash?

Discuss programming topics for any language, any source base. If it is programming related but doesn't fit in one of the below categories, it goes here.

Moderator: InsideQC Admins

Short file checksum/hash?

Postby Spirit » Thu Aug 20, 2015 1:28 pm

I am increasingly frustrated with Quaddicted and am thinking about redoing the whole file archive. Problem is that there are files that fit in many categories (eg SP maps with DM settings and vice-versa) and files that have colliding names. So one cannot split everything into categories nor put everything into one directory.

My solution would be to (brace yourselves) put every file into its own directory which would be named by a hash or checksum of the file, uniquely identifying it. I do not want to simply use a counter. It's <100k files I think, no idea what kinda of collision free "space" would be good.

Is there a hash or checksum that is short and appropriate for this? It would need to be URL compatible without escaping. I would be fine with sacrificing compatibility with case-insensitive file systems though. 6 to 8 characters would rock.
Improve Quaddicted, send me a pull request: https://github.com/SpiritQuaddicted/Quaddicted-reviews
Spirit
 
Posts: 1025
Joined: Sat Nov 20, 2004 9:00 pm

Re: Short file checksum/hash?

Postby Spike » Thu Aug 20, 2015 3:29 pm

interesting post considering that's your 999th post (and I'm a brit)...

at the end of the day, you need to accept that collisions are going to happen eventually.
with that in mind, the hash you use doesn't really need to be all that long (if they're going to happen, you might as well use a weak hash so you can test it).
(if your hash is for security then there's probably better ways to do it - ones that support multiple different hashes).

each file in its own directory sounds a bit excessive to me, but then I'm thinking about windows and its inability to store too many items in a single directory.
if all else fails, you could just take something like sha1 and fold the bits over each other with xor, resulting in a 32bit / 8-char hash. quakeworld had a habit of doing that with md4 hashes.

if you just want some weak hash that is present in every quake engine, CRC-16-CCITT is your friend, which should give a nice short 16bit 4-char hash.

frankly, the more important thing is how you're going to get the quake injector to cope with all of this, you're gonna force everyone to update. :(
one of these days I may get the motivation to make an in-engine version of the quake injector...
Spike
 
Posts: 2874
Joined: Fri Nov 05, 2004 3:12 am
Location: UK

Re: Short file checksum/hash?

Postby frag.machine » Thu Aug 20, 2015 10:10 pm

Just curious: any reason for not use the simple counter approach ?
If you are using some sort of database to keep the aggregate metadata (like author name, reviews, screenshots, etc) it's very likely you already have an integer primary key tied to the file.
I know FrikaC made a cgi-bin version of the quakec interpreter once and wrote part of his website in QuakeC :) (LordHavoc)
User avatar
frag.machine
 
Posts: 2052
Joined: Sat Nov 25, 2006 1:49 pm

Re: Short file checksum/hash?

Postby Spirit » Sat Aug 22, 2015 6:32 pm

If using the hash, you don't need a secondary lookup. The hash would be the unique identifier of a file anyways. If it was a counter, whatever wants information about the file would need to do several steps for finding out what it is.

I like to dream big (and complicated). Imagine telling an engine "install this.zip". It could calculate the hash and look up the instructions (Quaddicted or locally). If it used anything else, there would be more steps involved.

CRC-16-CCITT is too small. I think I will go for 8 characters so the sha1sum idea sounds good. Is there any benefit from mangling with the bits instead of just using the first 8 characters though?

Quake Injector will get the same data as before, it would require a lot more changes for more content if I actually do what I plan... I would also redirect/keep the current file URLs up.
Improve Quaddicted, send me a pull request: https://github.com/SpiritQuaddicted/Quaddicted-reviews
Spirit
 
Posts: 1025
Joined: Sat Nov 20, 2004 9:00 pm

Re: Short file checksum/hash?

Postby frag.machine » Sun Aug 23, 2015 1:45 am

Well, by definition a hash isn't unique. You may reduce the collision chance using larger values, but there's always a small chance of collision. OTOH it works well enough for BitTorrent, so may be worth to check their solution.
I know FrikaC made a cgi-bin version of the quakec interpreter once and wrote part of his website in QuakeC :) (LordHavoc)
User avatar
frag.machine
 
Posts: 2052
Joined: Sat Nov 25, 2006 1:49 pm


Return to General Programming

Who is online

Users browsing this forum: No registered users and 1 guest