[Cryptography] let's kill md5sum!

Sat Jun 6 06:00:31 EDT 2015

On Jun 5, 2015, at 10:22 PM, Zooko Wilcox-OHearn <zooko at leastauthority.com> wrote:
> [S]ome people tell me "Okay, we're
> going to switch from MD5 to BLAKE2, but our hash values have to fit
> into the fields where we used to store our MD5 hashes.". I tried my
> hardest to explain that no matter how good the hash function is,
> truncating the output to 128 bits is going to leave users potentially
> vulnerable to collision attacks at some point down the road. The
> response was "Well, we'll just take our chances, because we can't
> change the schema."....
While more bits for a hash function is certainly better, 2^64 is a *big* number.  You really need to run the economics here:  Assuming technology keeps advancing at current rates, in (say) 25 years, how much will it cost to do 2^64 BLAKE2 computations?  How does that compare to the value of one collision?

Many applications that store a checksum also store a data length.  To be useful, would a collision have to be for data of (close to) the same length?  If so, an attack gets harder, as you can't simultaneously attack all protected items - only those with (close to) the same length; and each computation has to be over data of that length, which is more expensive.  Since we're only talking about brute force here, defining a standard salting mechanism and choosing a per-site salt would force an attacker to pick a particular site to attack.  Some applications could be finer-grained - e.g., a per database salt.

So ... I wouldn't dismiss their decision completely.  Schema changes at large scale can be very disruptive, and people do try to avoid them.  (I'm actually not even sure how someone would transition from a large collection of existing MD5 checksums to shrunken BLAKE2 checksums.  Would they recompute all the checksums at once?  This could be an impractically long operation.  Do they have a spare bit somewhere they can use as an algorithm flag?  I suppose if records always have a creation date, you could define a cutover time:  Any record created before T uses MD5, any record at T or later uses BLAKE2.  You are forever subject to attacks against old records, but maybe they get less interesting over time - and if you're really worried you can do a long-term project to recompute checksums for old records and move T backwards.)

                                                        -- Jerry