More problems with hash functions

Fri Aug 27 08:51:01 EDT 2004

| Bear writes:
| > One interesting idea which I came up with and haven't seen a way
| > past yet is to XOR each block with a value computed from its
| > sequence number, then compute the hash function on the blocks in
| > a nonsequential order based on the plaintext of the blocks....
| > in their new order.
|
| This is an interesting idea, but it does not fully prevent the Joux
| attack.  This is seen most obviously in the two block case, where
| your method can at most increase the attacker's work by a factor of 2.
| Joux shows how to find a 4-collision using 2*2^(n/2) work, while it should
| take 2^(3n/4) work.  In the case of a 160 bit hash this is the difference
| between 2^81 and 2^120.  Doubling the work to find the 4-collision will
| not do much to address this discrepency.
|
| Even in the more general case, where Joux puts k blocks together to
| generate 2^k multicollisions, I can see a way to attack your method,
| with an increase in the work by only a factor of k^2....
There's a more general problem with the algorithm:  It isn't on-line.  An
implementation requires either an unbounded internal state, or the ability to
re-read the input stream.  The first is obviously impossible, the second a
problem for many common uses of hash functions (in communications, as opposed
to storage).

However ... *any* on-line algorithm falls to a Joux-style attack.  An
algorithm with fixed internal memory that can only read its input linearly,
exactly once, can be modeled as an FSM.  A Joux-style attack then is:  Find
a pair of inputs M1 and M1' that, starting from the fixed initial state, both
leave the FSM in state S.  Now find a pair of inputs M2 and M2' that, starting
from S, leave the FSM in state S'.  Then for the cost of finding these two
collisions, we have *four* distinct collisions as before.  More generally,
by just continuing from state S' to S'' and so on, for the cost of k
single collision searches, we get 2^k collisions.

The nominal security of the FSM is its number of states; for k large enough,
we certainly get "too many" collisions compared to a random function.

What this shows is that our vague notion that a hash function should behave
like a random function can't work.  On-line-ness always kills that.  (In
fact, even given the function random access to its input doesn't help:  An FSM
can only specify a finite number of random "places" to look in the input.
If the FSM can only specify an absolute location, it can't work on arbitrarily
long inputs.  If it can specify a location relative to the current position,
you can re-write the machine as a linear one that gets repeated copies of
blocks of a fixed size.)

This is really just the prefix property writ large.  Define an on-line
function f:{0,1}*->{0,1}^k as one with the property that there exist
infinitely many pairs of strings Si, Si' with the property that, for any S in
{0,1}*, f(Si||S) = f(Si'||S).  (In particular, this is true for S the empty
string, so f(Si) = f(Si').)  Then the most we can expect of an (on-line)
hash function is that it act like a function picked at random from the set of
on-line functions.  (This, of course, ignores all the other desireable
properties - we want to choose from a subset of the on-line functions.)

Getting a security parameter in there is a bit tricky.  An obvious first cut
is to say an on-line function has minimum length n if the shortest Si has
length n.

Actual hash functions don't follow this model exactly.  For example, MD5 needs
512+128+64 bits of internal storage* - the first for the input buffer, all of
whose bits are needed together, the second for the running hash state, the
third for the length.  So there are 2^704 states in the MD5 FSM, but the
output only has 2^128 values - only 128 bits of the "state number" are used.
That in and of itself isn't an issue - it doesn't hurt, in this particular
analysis, to throw bits away in the *final* result.  What does limit it is
that every 512 input bits, the FSM itself logically "mods out" 512 bits of the
state (the input buffer), and ends up in one of "only" 2^192 possible states
(and if we work with "short" strings, then only a small fraction of the 2^64
states of the length field are actually accessible).  Because of this, MD5 can
at best have been chosen from on-line functions of minimum length 128, rather
than those of a minimal length up to 704.  That's why keeping more internal
state *might* help - though you have to be careful, as we've seen, about how
you do it.
							-- Jerry

* Actually, you need 3 more bits because the MD5 length is in octets, not
bits. This memory is hidden in any real implementation on byte-oriented
hardware.

---------------------------------------------------------------------
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to majordomo at metzdowd.com