[Cryptography] Is Ron right on randomness

Fri Nov 25 23:15:30 EST 2016

>>> Everything that matters about randomness can be summarized in four
>>> bullet points:
>>>
>>> 1. You need two things: an entropy source, and a whitener. No entropy
>>> source is perfect, so you need a whitener no matter what. You don't
>>> have to
>>> do anything fancy in your whitener. Any cryptographically secure hash
>>> function (like SHA512) will do.
>>>
>>> 2. Since you need a whitener no matter what, it doesn't really matter
>>> how
>>> good your entropy source is, except insofar as it might take a long
>>> time to
>>> collect enough entropy from a very poor source. All that matters is
>>> that you
>>> have an accurate lower bound for how much entropy your source actually
>>> provides, and this is the case no matter how good (or bad) your source
>>> actually is. As long as you feed >N bits of entropy into your whitener,
>>> you can
>>> safely extract N bits of true randomness out of it.
>>>
>>> 3. You don't need more than a few hundred bits of randomness. 128 bits
>>> is
>>> enough, 256 is a comfortable margin, 512 is serious overkill. Seed a
>>> cryptographically secure PRNG with a few hundred bits of entropy and
>>> you
>>> can safely extract gigabytes of key material out of it.
>>
>> (I omitted #4)
>>
>> Is the above accurate?  Is it a reasonable design point to use for
>> OpenSSL's next CSPRNG?
>>

No, the above #2 is not accurate. It does matter how good your entropy
source is. The leftover hash lemma gives you the expression for the amount
of entropy you can extract from entropy sources - but doesn't tell you how
and for the real constructions the answer is worse. Subsequent papers
given bounds for certain specific extractors. This can be be summarized as
the 0.5 limit. If your input data has less that 0.5 bits of entropy per
bit of data, your extractor is swimming upstream slower than the stream is
moving downstream. This is true for seeded single input extractors. It is
more complicated with multiple input extractors, but in general, multiple
input extractors are the way to get above the 0.5 limit, whereupon you can
feed a single input cryptographic extractor.

It also doesn't address the issue of malicious sources as inputs to
multiple input extractors. There may or may not be extractors that can
defend against this, but there certainly are extractors for which a
malicious input can reduce the entropy at the output. The simplest example
being the XOR gate - one input full entropy, the other input malicious.
The malicious input just needs to be correlated with the other input in
order to reduce the entropy at the output. The Linux kernel entropy pool
mixing was is a more complicated example that has been well documented in
this respect.

The other thing that is probably relevant, but not mentioned above is that
frequent reseeding, or other schemes for injecting fresh entropy into the
system is an effective defense against a class of attacks that seek to
learn the internal state of the CSPRNG. It's hard to do this well in
software because you don't usually get to use every spare CPU cycle to do
more extraction and reseeding. It would upset people coping with their
unnecessarily slow computer. So how in software do you know you are
putting fresh state in faster than an attack is learning it?  How do you
know how much effort to apply? A hardware extractor gets to sit there all
day extracting seed data from an entropy source that similarly can sit
there all day turning noise into bits. This is one of the basic design
principles of the RNG's we design - reseed times should be measured in
microseconds, not seconds.

A software CSPRNG is simple enough. Seeding it is the hard part. How do
you extract from a source when you don't know the nature of the source? A
source is a hardware component. The extractor design needs to be strongly
coupled to the nature of the entropy source. If it's a weak source in
terms of min entropy (Hinf(X) < 0.5n) then maybe some other more
descriptive property of the source lets you use design a pre extractor to
get above 0.5. So designing an entropy source pretty much obligates you to
design the extractor to go along with it. A general purpose software
crypto library like openssl is probably the wrong place to do extraction,
unless you are writing a comprehensive extraction algorithm that is aware
of the nature of many sources out there and is able to identify them
unambiguously and then do the right thing with them from an extractor
algorithm point of view.

DJ