Extracting uniform randomness from noisy source

Tue Aug 6 23:10:45 EDT 2002

At 0500 AM 8/6/02 +0000, David Wagner wrote
>Amir Herzberg  wrote
>>Oops, sorry, I wasn't clear above; AES_k is of course using the CBC_MAC 
>>using AES.

>I don't believe this is secure, either.
>
>Notice that you never gave a security theorem, or a proof of security.
>If the goal is a construction with a rigorously justified security
>analysis, this deficiency should be a warning sign.  And in fact, there
>are some attacks, as I show below.

The thing that struck me, thinking about this, is that the kind of security
proof available for any of these primitives is a poor fit for the actual
problem.  That's pretty-much a recipe for getting proposals that don't work
in practice.

Would it make sense to differentiate the two different things we're trying
to do, here?  

a.  Map the entropy from one string sampled from a large set with only
approximately-known distribution to a short string, in such a way that we
have a strong expectation that the short string will have a uniform
distribution.

b.  Output the result in a way that doesn't leak partial information about
its inputs, and that (if full entropy was provided in the input) isn't
distinguishable from a random value in practice. 

Also, in a practical system, we're likely to want:

c.  Output the result in a way that is expected to be unconditionally
secure (indistinguishable from random to unbounded attackers) if there was
sufficient entropy in the input, and computationally secure (reducing to
the strength of AES, HMAC-SHA1, etc.) if not.

>For instance, suppose all the entropy of the input is contained in just
>one block say, the first block varies, and all other blocks are fixed
>at all-zeros.  Then we can mount the same attack I described before to
>distinguish a pseudorandom output from a true-random value.  Simply take
>the output, decrypt it backwards as far as you can go, assuming all-zeros
>in the trailing blocks, and the result will be a candidate value for
>the first block of input; if it looks like a plausible input sample,
>then you probably received an output from the pseudorandom source,
>else you probably received true-random bits.

Er, how much entropy are you assuming in this input?  I see two possibilities:

a.  The first input block is a random 128-bit value, and the rest of the
inputs are known.  Decrypting this with the known key doesn't let you
distinguish the output from a random block, since you can always decrypt
any random 128-bit block in this way and get a random 128-bit block.

b.  The first input block is not a random 128-bit value, and can reliably
be distinguished from one.  In this case, the input just doesn't have full
entropy, and any known function you apply to it with a 128-bit output is
distinguishable from a random output.  A one-way function just makes it
harder to distinguish these outputs, for a computationally-bounded
attacker.  But how important this is depends on our assumptions about the
attacker's abilities; if we assume the attacker can do 110-bit searches,
then he can generally distinguish the output of *any* known function with
only 110 bits of entropy with reasonable probability.  

>This same attack works even if the varying block is in the middle.
>So the first lesson is that it does not suffice just to have enough
>entropy in the input; it must be well-spread across blocks.

Same comment, though.  If there's a single 128-bit random block in the
middle, with everything else fixed, and it's really random, then the
128-bit output from CBC-MAC with a fixed key won't be distinguishable from
a random 128-bit value.  It's only when you're generating outputs with less
that 128 bits of input entropy that this works.  

>But being well-spread across blocks is no guarantee of security, either.
>Imagine you have a semi-random source, where each 16-bit word has 1
>bit of entropy, and sampled words are independent.  To accumulate enough
>entropy, you sample 80 words from this source, and then compress it using
>AES-CBC-MAC.  Well, this succumbs to a 2^40 meet-in-the-middle attack.
>Suppose you're given an output and you want to guess whether it came
>from the pseudorandom process or from a true-random process.  First,
>enumerate all 2^40 possibilities for the last 40 words, and decrypt
>backwards through 5 blocks to obtain a list of 2^40 candidate values for
>the intermediate value halfway through the CBC-MAC.  Next, enumerate
>all 2^40 possibilities for the first 40 words, and encrypt forwards
>through 5 blocks to obtain another list of 2^40 candidate values for
>the same intermediate value.  If there is any value that appears in both
>lists, conclude that you were probably given bits from the pseudorandom
>AES-based process.  If not, conclude that you were probably given bits
>from a true-random source.

Ah, this is a nice attack!  But I don't think it scales all the way up as
you're discussing.  Suppose we have 160 bits of entropy spread out as you
describe.  Then, when we do our meet-in-the-middle attack, we get 2^{80}
intermediate values, each of 128 bits, in each list.  With 2^{160} pairs of
intermediate values, each with a probability of 2^{-128} of colliding, we
expect false 2^{32} matches, and one correct one.  Carrying out the same
attack on any random 128-bit value, I'll expect 2^{32} false matches.  Is
there something I'm missing here?  

It looks to me like once we get much past 128 bits of entropy, this attack
stops working as a distinguisher.  It's still useful as a way of learning
possibly-secret internal state of the generating machine, but it doesn't
break the RNG.  Right?  

>This second attack shows that not only must the entropy be well-spread
>out, but you must have more entropy than you need, by a factor of two.
>In other words, the AES-CBC-MAC construction is wasteful of entropy.

>SHA1 hashing does not have any of these problems.  And moreover, the lack
>of a security theorem (or proof) for the AES-CBC-MAC construction raises
>the question of whether there are any more attacks we haven't noticed yet.

This is the really important point, IMO.  We don't yet have a security
proof for doing this job using hash functions or block ciphers or MACs, and
trying to reuse these existing security proofs is not too useful.  (Imagine
trying to use the pseudorandomness proof of AES in counter-mode to solve
this problem!)  

>In short, despite all the theoretical problems with rigorously justifying
>SHA1-hashing, I believe in practice it is by far the best technology
>available.  If asked by any practitioner, I would gladly recommend
>SHA1-hashing over the alternatives.

The only obvious problems with SHA1 for this job are that it's rather slow
(if we're hashing in lots of samples), and that its actual output
distribution is a little hard to reliably describe.  (For example, is there
any proof that all output values are even possible from the SHA1
compression function, for a given input chaining value?)  If we care about
computationally unbounded attackers, we'd probably like to have some
assurance that they're not going to find trivial flaws in our mechanisms.  

When I looked at this problem a couple years ago, I broke it down into the
distilling entropy task and the securely outputing the result task.  This
made things a lot simpler.  Though your conclusions about what function
works here are really strongly determined by your assumptions about
attackers.  My assumptions were something like:

a.  If my input samples have enough entropy to make my outputs random, then
I need to resist computationally unbounded attackers.  (Otherwise, why
bother with distilling entropy; just use a PRNG.)

b.  If my input samples are within some much more forgiving bound, then I
need to resist computationally-bounded attackers.  

c.  Attackers with control over some part of my inputs mustn't be able to
cancel out or block entropy in the other parts of the inputs.  

--John Kelsey, kelsey.j at ix.netcom.com

---------------------------------------------------------------------
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to majordomo at wasabisystems.com