RNG quality verification

Tue Jan 3 15:08:12 EST 2006

...
>From: Philipp Gï¿½hring <pg at futureware.at>
>Sent: Jan 3, 2006 12:13 PM
>To: "Steven M. Bellovin" <smb at cs.columbia.edu>, cryptography at metzdowd.com
>Subject: Re: RNG quality verification

...
>Hi,
>
>Ok, now I did the first test.
>I took OpenSSL, generated 10000 RSA keys, and took them apart.
>First I analyzed the raw keys:

It's helpful if you try to determine what you're testing here.  What
is the ideal situation which you hope to approximate by your RNG?  For
example, RSA moduli will always be odd, since they're the product of
two odd primes.  So you're going to get some statistical flaws when
you look at the low bit.  Similarly, the high few bits will be
somewhat skewed by our choices of p and q (you usually set the high
bits so that you get the desired size of modulus).  And you may be
choosing p and q according to some other criteria--I don't know what
OpenSSL does exactly.  

Further, you're looking at the result of using the PRNG outputs in a
really complicated way.  There might be all kinds of nonrandom
behavior in the outputs that would not be apparent at all in the
result of generating RSA keys.  Think about what's happening--you're
generating a random starting point of 512 bits with a certain form
(high bit set), and then sieving it, and then running a primality test
for a bunch of rounds.  Then you're doing the same thing a second
time.  Then you're multiplying the numbers together. 

To assess a cryptographic PRNG, you need to know two things:

a.  If it had a starting point or seed which was impossible to guess,
would you be able to find any problems with its outputs?

b.  Does it get a starting point or seed which is impossible to guess?

Assessing (a) is about cryptanalysis; statsitics can help there, but
mostly, you're looking at the output from some cryptographic function
like SHA1 or AES or 3DES.  Assessing (b) is about data
analysis--you're going to look at the sources for seed material, and
try to determine what makes them ultimately unpredictable, and to
model them somehow.  You can't assess how much entropy some variable
has without some kind of probability model for it.  It's really hard
to get a satisfactory model for most of the OS/software sources,
unfortunately.  If you can't get a simple model for it, you can at
least attempt to determine some kind of bound on its entropy.  See
Peter Gutmann's wonderful paper from USENIX a few years ago on his
Cryptlib's PRNG for some flavor of what this kind of analysis looks
like.  

The critical thing to understand here is where your statistical tools
are and are not useful.  There's no harm in running some big
statistical tests on the outputs from a cryptographic mechanism, and
you may in principle even find something this way, but you probably
won't.  To the extent that you have a probability model for the source
of seed material or entropy you're using, you can use statistical
tests to check the plausibility of your model.  (That is, if your
model says that successive samples of some variable are independent,
it's really a good idea to run some statistical tests to check
that--Chi Square tests of independence, autocorrelation, whatever
makes sense with the kind of data you have.)  

...
>Perhaps I should have stated the quality demands for possible solutions:
>Since I am working on a practical solution, and not a theoretical solution, 
>the following demands apply:
>* A 99.999% solution is ok.

Well, the analysis of your entropy source is ultimately a data
analysis problem.  You go back and forth between your best
understanding of the underlying process that generates entropy, your
data, and what kind of models you can work with, refining your
understanding of your entropy source until your money or time run
out.  If some attacker has a much better model than yours, he may be
able to predict your seeds and thus break your PRNG.  (This is a good
reason to seed the PRNG with more estimated entropy than the minimum
you can get away with--there are other reasons to do this, too.)  

...
>> However -- and it's a big "however" -- numbers that are "random enough"
>> for statistical purposes are not necessarily good enough for
>> cryptographic purposes.  As several people have pointed out already,
>> there are processes involving cryptographic algorithms that produce
>> very "random" sequences, but are in fact deterministic to someone who
>> knows a secret.  In other words, if you don't control the generator,
>> it's not possible to distinguish these two cases.  
>
>Has anyone tested yet, how much samples are needed to detect those PRNGs?

Suppose I'm using AES128 with a key of 0 in counter mode, starting
with a counter of 0.  This will pass all the statistical tests you run
against it.  But with a single 128-bit output, I am nearly certain
of identifying this source.     

If you don't know the key I'm using for AES, then distinguishing
counter mode outputs is as hard as breaking AES, for some very
precisely defined meaning of the work "breaking."  But it has no more
than 128 bits of entropy, because if you ever guessed the right 128
bit AES key, you could predict all future outputs from my PRNG
forever.  That means it would be silly to use, say, AES128 to generate
256-bit AES keys, since the attacker would only need to guess the
128-bit AES key to learn all the 256-bit AES keys you were generating.

For what it's worth, we're working on a standard for cryptographic
random number generation in X9.  There's a draft document (SP800-90)
up on the NIST website

http://csrc.nist.gov/publications/drafts.html#sp800-90

discussing some hopefully good crypto PRNGs, and some guidelines for
their use.  This doesn't discuss much about analyzing entropy sources
(we're still hashing that out).  If you want to understand the
security of crypto PRNG algorithms, you can look at some papers I
did on this with Bruce Schneier, Niels Ferguson, David Wagner, and
Chris Hall:

http://www.schneier.com/paper-yarrow.html

www.cs.berkeley.edu/~daw/papers/prngs-fse98.ps 

Also, Peter's PRNG paper is at his website:

http://www.cs.auckland.ac.nz/~pgut001/

And Lisa Yin did a paper with Desai and Hevia for Eurocrypt 2002
trying to do reduction proofs for various PRNGs--basically showing
that if certain properties of the hash function or block cipher used
hold, then the PRNG is secure.  I don't know if there's an online
version available.  

If you want to understand how to do the data analysis for a hardware
entropy source, I recommend looking at the analysis of the Intel and
VIA C3 hardware RNGs, both on Cryptography Research's site.

http://www.cryptography.com/research/evaluations.html

The big thing to understand here is how much nicer life is when you
design the source of entropy from the beginning to follow some kind of
reasonable probability model, rather than looking for some way to
adapt a mathematically useful model to some complicated process like
OS loading or something. 

There's a reasonably nice suite of statistical tests available from
NIST, though I've heard some complaints about the portability of the
code.  More broadly, there are a gazillion statistical packages out
there.  But if you don't understand your model, you'll just shoot
yourself in the foot by throwing sophisticated statistical tools at
the problem.  

>Best regards,
>Philipp Gï¿½hring

--John Kelsey, NIST

---------------------------------------------------------------------
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to majordomo at metzdowd.com