Extracting unifrom randomness from noisy source

Mon Aug 5 12:58:35 EDT 2002

At 04:11 PM 8/4/02 +0300, Amir Herzberg wrote:
>Hi all, I've looked into this a bit further, and here is summary. 

>1. In order to extract randomness from an arbitrary noisy source, we
>need to use some short, `seed` random string. We assume that the noise
>is independent of this `seed`. (If we don't use a seed, then every
>randomness extracting (deterministic) function has some distribution of
>input (noise) for which the output is not uniformly distributed - e.g.
>any distribution of inputs which map to one output...)

The seed idea makes the proof easier, but doesn't really add much, does it?  

We have a large set of input strings, whose precise input distribution is
unknown, but whose approximate distribution we know well enough to
reasonably claim some amount of entropy.  Each time we generate an entropy
output, we're sampling one string from our distribution, and applying a
function to it to reduce it to some short, hopefully-random-looking
bitstring.  We're hoping that the function output is going to be so close
to uniformly distributed that a computationally unbounded attacker given a
sequence of these outputs as long as we will get from this RNG during its
whole life can't reliably distinguish between this RNG's outputs and a real
uniformly-distributed set of outputs.  

In choosing this function, we're hampered by our ignorance of that input
distribution.  If we knew it precisely, we could design a function that
would be guaranteed to be uniform and independent and all that.  Instead,
what we can reasonably do is select a function so that, for the
overwhelming majority of input distributions consistent with our knowledge
of this input distribution, we will get an approximately uniform output
distribution from our function.  When we choose this function, and whether
we choose it via this short seed or by just picking a convenient function
off the shelf is irrelevant here.  If an opponent were choosing our input
distribution, then it would be important to make him commit to his choice
before we chose our function, but the opponent here is nature or bad luck
or something.  Given our knowledge, choosing a CRC with a random
irreducible polynomial as our function is exactly as reasonable as choosing
one of the standard, off-the-shelf CRC polynomials, for example.  

In fact, no sane engineer will choose the function in a way that's truly
independent of the input distribution; he will test the function with the
input distribution as much as possible, and discard it and choose another
if he finds some problem with it.  Surely this doesn't *decrease* security....

>5. While the extractors described by Nisan (and others) are indeed very
>efficient, I'm not aware of any available implementations. Implementors
>may consider using therefore their favorite block cipher, e.g. AES,
>using a random key. Notice that this random key should be selected
>uniformly but could be part of the software, common to all deployments
>and non-secret; security is only based on the independence of the
>sampled noise in this key. Namely, 
>
>pseudo-random = AES_k (noise) 
>
>Security of this construction follows from the assumption that the block
>cipher (e.g. AES) is a Pseudo-random function; notice that this is a
>standard assumption for block ciphers (and therefore block ciphers are
>cryptoanalysed to meet this assumption). 

There are two points worth considering, here:

a.  If noise is exactly 128 bits wide, this has no effect on the amount of
entropy in the result, of course.  We're just relabeling the symbols,
without altering their distribution.  

b.  When noise is much longer, we have to use the encryption mechanism
intelligently, or we'll shoot ourselves in the foot.  For example,
counter-mode has a proof of security that's as nice as the one for
CBC-mode, but we'd better not try to use counter mode to distill entropy
from a long input string.  What we can reasonably do is use CBC-MAC or
XOR-MAC or some such thing, so long as we're not expecting more than 128
bits of entropy from the result.  (In fact, for CBC-MAC, we're losing about
one bit of entropy to internal collisions for many strings that are
plausibly going to be very common in our input distribution.  XOR-MAC looks
good to me, but this is all based on back-of-the-envelope analysis, not
anything very serious yet.)  

I claim that the above constructions will be exactly as secure if we choose
the all-zero AES key as if we choose an AES key today uniformly and at
random.  There is no reason to expect any mutual information between my
choice of the all-zero AES key and our specific input distribution.  

>6. Of course under the random oracle model, h(x,k) is also a
>pseudo-random function... But why?

Can you see a reason why just using a CRC on your input strings wouldn't be
okay?  Certainly, an attacker could choose an input distribution that would
ruin this, but there's not an attacker choosing the input distribution in
practice, and for acceptably long strings, the CRC with any good polynomial
will have a uniform distribution.  The real question would then reduce to
whether or not the input strings were generated by some process that was
going to always have the same CRC value for this polynomial, and it's very
hard to see how that would happen in practice.  

To put it into a security-proof framework, imaging choosing the CRC
polynomial at random from the set of irreducible polynomials of the
required size, after the input distribution had been committed to.  

>Regards, Amir Herzberg

Comments?

--John Kelsey, kelsey.j at ix.netcom.com

---------------------------------------------------------------------
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to majordomo at wasabisystems.com