The Pointlessness of the MD5 "attacks"

Thu Dec 16 08:22:10 EST 2004

On Wed, 15 Dec 2004 10:06:10 -0500 (GMT-05:00), John Kelsey
<kelsey.j at ix.netcom.com> wrote:
> 
> So, are you sure there can never be a program which allows such an exploit?  I've seen programs that had embedded components (state machines in particular) which were not easily human-readable, and had themselves been generated by computer.  And even large graphics, sound, or video sequences can really change the meaning of a program's actions in some ways; those might be susceptible to the requirements of the attack.  I agree it's hard to see how to exploit the existing MD5 collision attacks in programs that would look innocent, but I don't see what makes it *impossible*.
> 
> Finally, I'm very skeptical that the attacks that have been found recently are the best or only ones that can be done.
> Do we have any special reason to think that there will never be a way to adapt the attack to be able to slip something plausible looking into a C program?  Once your hash function starts allowing collisions, it really just becomes a lot less valuable.
> 

Exactly.

Yes, malware can be distributed without any use of any hash function.
I try to demonstrate on an example, what the MD5 collision provides
and how to make it 'innocent'. You don't have to put malicious code in
the program. For example, telnet is not secure, but there are uses of
it which don't need encryption/authentication. Similarly, HTTP is
widely used, but HTTPS can be used when needed. Take a bank as an
example. Some transactions require secure connection, but not every
page on the bank server has to be transmitted using a secure
connection. If an attacker would cause the software to switch from
HTTPS to HTTP in the "right" moment, it could lead to disaster.

Most of data (file) formats have some sort of header. An attacker can
do this: He will say that the colliding block is a header, with first
four bytes being the 'magic bytes' as you know from many formats (e.g.
exe has MZ, bmp has BM, elf executables have ELF). Next, he will
proclaim that the rest of the 1024 bits are some data, e.g. some
counts of something and the differing bits/bytes will be some flags.
The point is to use the bits in some calculation that is legitimate.
That means, it "moves the question of trust" onto the data package. A
person doing inspection would not find anything wrong, since the
person receives correct executable and the "good" data package. That
is the difference between use and non-use of MD5 collision. An
obfuscated malicious code can be discovered, even though it may be
difficult. When you use collision, it may not be discovered, since
there is no "evil" action in the code. The "evil" is created as a race
condition using the "evil" colliding package.

Now, I try an example (I know it's not the best one, but I hope it
makes the point clear):

Suppose an attacker is packager in a company and his role is to take
software, pack it up and create installation scripts. The package is
still reviewed after his packaging. Let's suppose the company is
working on some server-bundle-software which installs http server, IDS
and optionally SSH and Telnet. By default, Telnet is disabled.
Telnet's role may be for example so that anyone could telnet to the
machine and have e.g. statistics of server usage or some non-sensitive
information or computation. That's the justification of having the
telnet there in the software. If one wanted to use it, he would enable
it and configure it correctly so that anonymous users can't do any
harm or get to sensitive information.

Well, there is the flag in the header (=the colliding block) saying if
it should be installed or not. After packaging is complete, it is
inspected once more and if no flaw is found, it is put on company's
ftp/web server for distribution along with MD5 sums published on
multiple places. Then the attacker (as an insider) swaps the two
packages. Anyone believing the MD5 sums would install the software and
wouldn't realize that telnet is installed and not configured
(documentation says something different). That could open a way for
anyone in.

To sum it up, the MD5 collision allows:
-have code and data, that do not do anything unwanted (can be proven
by inspecting the code)
-the other colliding package does the same, but it creates some
unnecesary service/action that is dangerous in given context. The
presence of the service is justified, though. In default installation
it is disabled. So says the manual. But that's not true. So the MD5
collision itself does not do much, but with additional trickery it
could be used.

If the flaw would be discovered in the software, who would be held
responsible in the company? The testers, inspectors of code. Not the
packager. He could sell the knowledge of the security hole to
spammers, virus writers, etc. I'm not saying the MD5 collision allows
for a highly practical attack (at least the example involves an
insider), but when we know it's possible, why should we continue using
MD5?

Ondrej Mikle

---------------------------------------------------------------------
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to majordomo at metzdowd.com