MD5 To Be Considered Harmful Someday

Mon Dec 6 19:14:49 EST 2004

I've been doing some analysis on MD5 collision announced by Wang et al.  
Short version:  Yes, Virginia, there is no such thing as a safe hash 
collision -- at least in a function that's specified to be 
cryptographically secure.  The full details may be acquired at the 
following link:

http://www.doxpara.com/md5_someday.pdf

A tool, Stripwire, has been assembled to demonstrate some of the attacks 
described in the paper.  It may be acquired at the following address:

http://www.doxpara.com/stripwire-1.1.tar.gz 

Incidentally, the expectations management is by no means accidental -- 
the paper's titled "MD5 To Be Considered Harmful Someday" for a reason.  
Some people have said there's no applied implications to Joux and Wang's 
research.  They're wrong; arbitrary payloads can be successfully 
integrated into a hash collision.  But the attacks are not wildly 
practical, and in most cases exposure remains thankfully limited, for 
now.  But the risks are real enough that responsible engineers should 
take note:  This is not merely an academic threat, systems designed with 
MD5 now need to take far more care than they would if they were 
employing an unbroken hashing algorithm, and the problems are only going 
to get worse.

Some highlights from the paper:

* The attack itself is pretty limited -- essentially, we can create 
"doppelganger" blocks (my term) anywhere inside a file that may be 
swapped out, one for another, without altering the final MD5 hash.  This 
lets us create any number of binary-inequal files with the same md5sum.

* MD5 uses an appendable cascade construction -- in other words, if you 
happen to find yourself with two files that MD5 to the same hash, an 
arbitrary payload can be applied to both files and they'll still have 
the same hash.  This leads to...

* Attacks are possible using only the proof of concept test vectors 
released by Wang -- the actual attack is not necessary.

* Stripwire emits two binary packages.  They both contain an arbitrary 
payload, but the payload is encrypted with AES.  Only one of the 
packages ("Fire") is decryptable and thus dangerous; the other ("Ice") 
shields its data behind AES.  Both files share the same MD5 hash.

* Digital Signature systems are vulnerable, as they almost always sign a 
hashed representation of data rather than the data itself.

* This is an excellent vector for malicious developers to get unsafe 
code past a group of auditors, perhaps to acquire a required third party 
signature.  Alternatively, build tools themselves could be compromised 
to embed safe versions of dangerous payloads in each build.  At some 
later point, the embedded payload could be safely "activated", without 
the MD5 changing.  This has implications for Tripwire, DRM, and several 
package management architectures.

* HMAC's invulnerability has been slightly overstated.  It's definitely 
possible, given the key, to create two datasets with the same HMAC.  
Attacker possession of the key violates MAC presumptions, so the impact 
of this is particularly questionable.

* Very interesting possibilities open up once the full attack is made 
available -- among other things, we can create self-decrypting 
executables (fire.exe and ice.exe) that exhibit differential behavior 
based on their internal colliding payloads.  They'll still have the same 
MD5 hash.

* Several doppelgangers may (relatively quickly, as per Joux) be 
computed within a single multicollision-friendly block.  As such, the 
particular selection of doppelganger sets within a file can itself be 
made to represent data.  It's relatively straightforward to embed a 128 
bit signature inside an arbitrary file, in such a way that no matter the 
value of the signature, a constant MD5 hash is maintained.  This is 
curiously steganographic.

* Many popular P2P networks (and innumerable distributed content 
databases) use MD5 hashes as both a reliable search handle and a 
mechanism to ensure file integrity.  This makes them blind to any 
signature embedded within MD5 collisions.  We can use this blindness to 
track MP3 audio data as it propagates from a custom P2P node.  
"Strikeback" capacity against executable trafficking is even more 
pronounced -- it's possible to create application installers that 
self-modify with host identifying characteristics but still successfully 
retransmit on P2P networks under the global search hash.

I hope this paper proves useful to the security community at large, and 
I welcome feedback.

--Dan Kaminsky
www.doxpara.com
dan at doxpara.com

---------------------------------------------------------------------
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to majordomo at metzdowd.com