A risk with using MD5 for software package fingerprinting

Sun Jan 27 12:07:21 EST 2002

The cryptographic hash function MD5 is often used to authenticate 
software packages, particularly in the Unix community. The MD5 hash 
of the entire package is calculated and its value is transmitted 
separately. A user who downloads the package computes the hash of the 
copy received and matches the value against the original.

Putting aside the question of how the the hash value can be safely 
transmitted separately, there is a potential attack on this method 
due to the 128 bit length of the MD5 hash output.

If all the individuals having input to the creation of the original 
software package are trustworthy, then 128 bits would appear to 
provide adequate security. A man-in-the-middle attacker would have to 
solve a 128 bit problem to create a Trojan horse infected package 
that passed the hash verification. That is considered computationally 
infeasible, at least until the advent of quantum cryptography.

One might think the above argument proves MD5 is sufficient, since if 
an attacker had an agent working inside the organization that 
produced the package, the agent could simply insert the Trojan 
software patch in the original package. However such an insertion is 
very risky. A sophisticated software company would likely have code 
reviews that would make introduction of the Trojan code difficult. In 
an open source model, anyone could detect the insertion. The 
insertion would then be foiled, the agent would be uncovered and the 
technical means that the Trojan employed would be compromised.

A safer attack would be for the agent to insert an apparently 
innocent modification to the package selected so that the MD5 hash of 
the package with the Trojan code matches the hash of the original 
package. Since the attacker controls the Trojan code, calculating the 
value of this modification is subject to the birthday paradox and 
presents presents a 64-bit problem. Solving such a problem is within 
the means of a well-funded attacker today.

The modification could be designed to get past code reviews in a 
number of ways. For example, 64 low order bits in a JPEG icon might 
be altered. The agent would have to be in a position to make the last 
modification to the software package prior to release and to send a 
final pre-release version of the package to the attacker, but those 
are hardly insurmountable hurdles.  In the open source model, where 
new releases can be frequent, it may suffice to carry out this attack 
only occasionally, say to recover private keys.

The obvious solution to this problem is to use a wider hash. For 
example, SHA-256 would present an group using this attack with a 
128-bit problem. Even SHA1 would be preferable, making such an attack 
an 80  bit problem.  The cost of using a wider hash in this situation 
is trivial. It would seem the prudent thing to do.

Arnold Reinhold

---------------------------------------------------------------------
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to majordomo at wasabisystems.com