Extracting uniform randomness from noisy source

Sun Aug 11 16:45:52 EDT 2002

At 11:09 PM 8/7/02 +0000, David Wagner wrote:
>John Kelsey  wrote:
>>a.  If my input samples have enough entropy to make my outputs random, then
>>I need to resist computationally unbounded attackers.  (Otherwise, why
>>bother with distilling entropy; just use a PRNG.)
>>
>>b.  If my input samples are within some much more forgiving bound, then I
>>need to resist computationally-bounded attackers.  
>>
>>c.  Attackers with control over some part of my inputs mustn't be able to
>>cancel out or block entropy in the other parts of the inputs.  
>
>I agree these would be great properties to have.  Sadly, I don't know
>of any construction that plausibly achieves all three, in theory or
>in practice.

Hmmm.  I don't see that they're really unattainable, though they may not be
possible to prove secure without unreasonable assumptions.  

Consider using SHA1 to hash each input string, but then outputting only 80
bits of the output.

Now, since we're only outputting half the output bits, we don't have to
worry about the fixed-suffix attack I pointed out earlier.  In fact, we
have a nice, 160-bit wide pipe all the way to the end, so any internal
collision problems basically just go away.  I think this meets all three
criteria.  (But it ought not to ever be output or used directly in a way an
attacker can see; it should be shielded by some longer-term cryptographic
secret so that it has computational security even if occasional input
strings are totally known to the attacker.)  

Specifically:

a.  With far more than 2^{80} equally likely inputs, we expect to get very
nearly uniform output distribution from this scheme.  

b.  With exactly 2^{80} equally likely inputs, we expect to get a single
collision, and so lose almost no entropy.

c.  With less than 2^{80} equally likely inputs, we expect to get no
collisions, lose no entropy, but we may be susceptible to brute-force
search of possible input strings to distinguish our output from a random
output of 80 bits.  

...
>If we have to give up one, the one I'd be most willing to give up is
>property a.  After all, we almost never have enough entropy, and we almost
>always take the output of the PRNG and just use it in some cryptosystem
>that is insecure against computationally unbounded attacks anyway.

I mostly agree with this.  It's hard to convince yourself you have enough
entropy against the most powerful possible attackers, anyway.  But if
someone is claiming to provide you random noise from your system or
something, it sure seems like its strength ought not to be based on the
cryptographic strength of SHA1.  Otherwise, why not just use SHA1 in one of
the obvious ways to generate PRNG outputs once you've gotten a
hopefully-secure seed?  Or AES, for that matter?  

Your really big assumptions about SHA1 (or AES-CBC-MAC with an all-zero
key, or a 32-bit CRC, or whatever else may be used here) involve how well
they distill entropy.  This basically requires that you assume that the set
of input strings that occurs isn't pathologically bad with respect to
causing collisions in SHA1, or the 80 bits of SHA1 you actually output, or
whatever.  (It's not possible to choose any function that never has this
happen, but it's pretty easy to choose functions that almost never have it
happen, assuming randomly-selected input strings.  But, of course, the
input strings we'll get in a real-world system aren't randomly selected,
they're just selected in a way that's independent of the structure of the
entropy distillation function.  For example, for the overwhelming majority
of sets of 2^{128} random 1024-bit strings, simply XORing each 128 bits
together will successfully distill the entropy from the strings; however,
just a flat XOR folding would be a bad idea with the kinds of input we
expect in real-world systems.)  

>Think of a. like asking for your encryption scheme to be information
>theoretically secure.  Sure, if you can afford such an encryption scheme,
>that's great.  But in practice the one-time pad is too expensive, so we
>gladly settle for mere security against computationally bounded attacks.
>I think PRNGs are similar.

Right.  But it's important to specify which you're trying to accomplish.
Distilling entropy and outputting it in some form is supposed to be
information-theoretically secure, right?  You want it to fail gracefully
into computationally secure if your input entropy estimates are messed up,
but that's not the original goal.  There are really different assumptions
and requirements for the two different goals.

For example, in a system whose only goal is to be computationally-secure,
we can distill entropy by computing a 128-bit CRC over our sample string,
using that result as an AES key, and running AES in counter mode.  We can
output trillions of bits per string, even if we have only a set of 2^{80}
equally-likely input strings.  This is computationally secure given a set
of assumptions about the interaction of the CRC with our input string set,
which is identical to the assumptions we need to make about the interaction
of SHA1 with our input string set to justify using the hash function as an
entropy distillation function.  But it wouldn't make much sense to use this
to output distilled entropy (uniform randomness from a noisy source), e.g.,
for one-time pads, or even for generating 256-bit keys.  

...
--John Kelsey 

---------------------------------------------------------------------
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to majordomo at wasabisystems.com