[Cryptography] Hard Truths about the Hard Business of finding Hard Random Numbers

Wed Jan 29 17:14:05 EST 2014

On 29/01/14 23:36 PM, Arnold Reinhold wrote:

>> Somehow this discussion tends to run into circles.
> 
> An astute observation. I submit this happens because there is no standard or guideline nor a process to get one that has any acceptance.  I suggested a Wiki as a start. Any other ideas?

Here's my effort (easier to read on the site):

http://financialcryptography.com/mt/archives/001471.html

Hard Truths about the Hard Business of finding Hard Random Numbers

As many have noticed, there is now a permathread (Paul's term) on how to
do random numbers. It's always been warm. Now the arguments are on solid
simmer, raging on half a dozen cryptogroups, all thanks to the NSA and
their infamous breach of NIST, American industry, mom's apple pie and
the privacy of all things from Sunday school to Angry Birds.

Why is the topic of random numbers so bubbling, effervescent,
unsatisfying? In short, because generators of same (RNGs), are *hard*.
They are in practical experience trickier than most of the other modules
we deal with: ciphers, HMACs, public key, protocols, etc.

Yet, we have come a long way. We now have a working theory. When Ada put
together her RNG this last summer, it wasn't that hard. Out of our
experience, herein is a collection of things we figured out; with the
normal caveat that, even as RNs require stirring, the recipe for
'knowing' is also evolving.

1.    *Use what your platform provides*. Random numbers are hard, which
is the first thing you have to remember, and always come back to. Random
numbers are so hard, that you have to care a lot before you get
involved. A hell of a lot. Which leads us to the following rules of
thumb for RNG production.
        a. *Use what your platform provides*.
        b. Unless you really really care a lot, in which case, you have
to write your own RNG.
        c. There isn't a lot of middle ground.
        d. So much so that for almost all purposes, and almost all
users, Rule #1 is this: *Use what your platform provide*s.
        e. When deciding to breach Rule #1, you need a compelling
argument that your RNG delivers better results than the platform's.
Without that compelling argument, your results are likely to be more
random than the platform's system in every sense except the quality of
the numbers.

2.    Software is our domain.
        a. Software is unreliable. It can be made reliable under bench
conditions, but out in the field, any software of more than 1 component
(always) has opportunities for failure. In practice, we're usually
talking dozens or hundreds, so failure of another component is a solid
possibility; a real threat.
        b. What about hardware RNGs? Eventually they have to go through
some software, to be of any use. Although there are some narrow
environments where there might be a pure hardware delivery, this is so
exotic, and so alien to the reader here, that there is no point in
considering it. Hardware serves software. Get used to it.
        c. As a practical reliability approach, we typically model every
component as failing, and try and organise our design to carry on.

3.    Security is also our domain, which is to say we have real live
attackers.
        a. Many of the sciences rest on a statistical model, which they
can do in absence of any attackers. According to Bernoulli's law of big
numbers, models of data will even out over time and quantity. In
essence, we then can use statistics to derive strong predictions. If
random numbers followed the law of big numbers, then measuring 1000 of
them would tell us with near certainty that the machine was good for
another 1000.
        b. In security, we live in a byzantine world, which means we
have real live attackers who will turn our assumptions upside down, out
of spite. When an attacker is trying to aggressively futz with your
business, he will also futz with any assumptions and with any tests or
protections you have that are based on those assumptions. Once attackers
start getting their claws and bits in there, the assumption behind
Bernoulli's law falls apart. In essence this rules out lazy reliance on
statistics.

4.    No Test. There is no objective test of random numbers, because it
is impossible to test for unpredictability. Which in practical terms
means that you cannot easily write a test for it, nor can any test you
write do the job you want it to do. This is the key unfortunate truth
that separates RNs out from ciphers, etc (which latter are amenable to
test vectors, and with vectors in hand, they become tractable).

5.    Entropy. Everyone talks about entropy so we must too, else your
future RNG will exhibit the wrong sort of unpredictability. Sadly,
entropy is not precisely the answer, enough such that talking about is
likely missing the point. If we could collect it reliably, RNs would be
easy. We can't so it isn't.
        a. Entropy is manifest physical energy, causing events which
cannot be predicted using any known physical processes, by the laws of
science. Here, we're typically talking about quantum energy, such as the
unknown state of electrons, which can collapse either way into some
measurable state, but it can only be known by measurement, and not
predicted earlier. It's worth noting that quantum energy abounds inside
chips and computers, but chips are designed to reduce the noise, not
increase it, so turning chip entropy into RNs is not as easy as talking
about it.
        b. There are objective statements we can make about entropy. The
objective way to approach the collection of entropy is to carefully
analyse the properties of the system and apply science to estimate the
amount of (e.g.) quantum uncertainty one can derive from it. This is
possible and instructive, and for a nice (deep) example of this, see
John Denker's Turbid.
        c. At the level of implementation, objective statements about
entropy fail for 2 reasons. Let's look at those, as understanding these
limitations on objectivity is key to understanding why entropy does not
serve us so willingly.
            i. Entropy can be objectively analysed as long as we do not
have an attacker. An attacker can deliver a faulty device, can change
the device, and can change the way the software deals with the device at
the device driver level. And much more...
            ii. This approach is complete if we have control of our
environment. Of course, it is very easy to say Buy the XYZ RNG and plug
it in. But many environments do not have that capability, often enough
we don't know our environment, and the environment can break or be
changed. Examples: rack servers lacking sound cards; phones; VMs;
routers/firewalls; early startup on embedded hardware.
        d. In conclusion, entropy is too high a target to reach. We can
reach it briefly, in controlled environments, but not enough to make it
work for us. Not enough, given our limitations.

6.    CSRNs. The practical standard to reach therefore is what we call
Cryptographically Secure Random Numbers.
        Cryptographically secure random numbers (or CSRNs) are numbers
that are not predictable /to an attacker/. In contrast to entropy, we
might be able to predict our CSRNs, but our enemies cannot. This is a
strictly broader and easier definition than entropy, which is needed
because collecting entropy is too hard, as above.
        Note our one big assumption here: that we can determine who is
our attacker and keep him out, and determine who is friendly and let
them in. This is a big flaw! But it happens to be a very basic and
ever-present one in security, so while it exists, it is one we can
readily work with.

7.    Design. Many experiments and research seem to have settled on the
following design pattern, which we call a Trident Design Pattern:

       Entropy collector  ----\
                               \ _____          _________
                                /     \        /         \
       Entropy collector  ---->( mixer )----->( expansion )-----> RNs
                                \_____/        \_________/
                               /
       Entropy collector  ----/

    In short, many collectors of entropy feed their small contributions
in to a Mixer, which uses the melded result to seed an Expander. The
high level caller (application) uses this Expander to request her random
numbers.

8.    Collectors. After all the above bad news, what is left in the
software toolkit is: redundancy .
        a. A redundant approach tells us to draw our RNs from different
places. The component that collects RNs from one place is called a
Collector. Therefore we want many Collectors.
        b. Each of the many places should be uncorrelated with each
other. If one of these were to fail, it would be unlikely that others
also would fail, as they are uncorrelated. Typical studies of
fault-tolerant systems often suggest the number 3 as the target.
        c. Some common collector ideas are:
            * the platform's own RNG, as a Collector into your RNG
            * any CPU RNG such as Intel's RDRAND,
            * measuring the difference between two uncorrelated clocks,
            timings and other measurands from events (e.g., mouse clicks
and locations),
            * available sensors (movement on phones),
            * differences seen in incoming new business packets,
            * a roughly protected external source such as a business feed,
        d. By the analysis that got us past Rule #1, there are no great
Collectors by definition, as otherwise we'd already be using them, and
this problem would go away.
        e. An attacker is assumed to be able to take a poke at one or
two of these sources, but not all. If the attacker can futz with all our
sources, this implies that he has more or less unlimited control over
our entire machine. In which case, it's his machine, and not ours. We
have bigger problems than RNs.
        f. We tend to want more numbers than fault-tolerant reliability
suggests because we want to make it harder for the attacker. E.g., 6
would be a good target.
        g. Remember, we want maximum uncorrelation. Adding correlated
collectors doesn't improve the numbers.
        h. Because we have redundancy, on a large scale, we are not that
fussed about the quality of each Collector. Better to add another
collector than improve the quality of one of them by 10%. This is an
important benefit of redundancy, we don't have to be paranoid about the
quality of this code.

9.    Mixer. Because we want the best and simplest result delivered to
the caller, we have to take the output of all those above Collectors,
mix them together, and deliver downstream.
        a. The Mixer is the trickiest part of it all. Here, you make or
break. Here, you need to be paranoid. Careful. Seek more review.
        b. The Mixer has to provide some seed numbers of say 128-512
bits to the Expander (see below for rationale). It has to provide this
on demand, quickly, without waiting around.
        c. There appear to be two favourite designs here: Push or Pull.
In Push the collectors send their data directly into Mixer, forcing it
to mix it in as it's pushed in. In contrast, a Pull design will have the
Mixer asking the Collectors to provide what they have right now. This in
short suggests that in a Push design the Mixer has to have a cache,
while in Pull mode, the Collectors might be well served in having caches
within themselves.
        d. Push or Mixer-Cache designs are probably more popular. See
Yarrow and Fortuna as perhaps the best documented efforts.
        e. We wrote our recent Trident effort (AdazPRING) using Pull.
The benefits include: simplified API as it is direct pull all the way
through; no cache or thread in mixer; and as the Collectors better
understand their own flow, so they better understand the need for
caching and threading.

10.    Expander. Out of the Mixer comes some nice RNs, but not a lot.
That's because good collectors are typically not firehoses but rather
dribbles, and the Mixer can't improve on that, as, according to the law
of thermodynamics, it is impossible to create entropy.
        a. The caller often wants a lot of RNs and doesn't want to wait
around.
        b. To solve the mismatch between the Mixer output and the
caller's needs, we create an expansion function or Expander. This
function is pretty simple: (a) it takes a small seed and (b) turns that
into a hugely long stream. It could be called the Firehose...
        c. Recalling our truth above of (c) CSRNs being the goal, not
entropy, we now have a really easy solution to this problem: Use a
cryptographic stream cipher. This black box takes a small seed
(a-check!) and provides a near-infinite series of bytes (b-check!) that
are cryptographically secure (c-check!). We don't care about the
plaintext, but by the security claims behind the cipher, the stream is
cryptographically unpredictable without access to the seed.
        d. Super easy: Any decent, modern, highly secure stream cipher
is probably good for this application. Our current favourite is ChaCha20
but any of the NESSIE set would be fine.

        e. In summary, the Expander is simply this: when the application
asks for a PRNG, we ask the Mixer for a seed, initialise a stream cipher
with the seed, and return it back to the user. The caller sucks on the
output of the stream cipher until she's had her fill!

11.    Subtleties.
        a. When a system first starts up there is often a shortage of
easy entropy to collect. This can lead to catastrophic results if your
app decides that it needs to generate high-value keys as soon as it
starts up. This is a real problem -- scans of keys on the net have found
significant numbers that are the same, which is generally traced to the
restart problem. To solve this, either change the app (hard) ... or
store some entropy for next time. How you do this is beyond scope.
        b. Then, assuming the above, the problem is that your attacker
can do a halt, read off your RNG's state in some fashion, and then use
it for nefarious purposes. This is especially a problem with VMs. We
therefore set the goal that the current state of the RNG cannot be
rolled forward nor backwards to predict prior or future uses. To deal
with this, a good RNG will typically:
            * stir fresh entropy into its cache(s) even if not required
by the callers. This can be done (e.g.) by feeding ones own Expander's
output in, or by setting a timer to poll the Collectors.
            * Use hash whiteners between elements. Typically, a SHA
digest or similar will be used to protect the state of a caching element
as it passes its input to the next stage.
        c. As a technical design argument, the only objective way that
you can show that your design is at least as good as or better than the
platform-provided RNG is the following:
            i. Very careful review and testing of the software and
design, and especially the Mixer; and
            ii. including the platform's RNG as a Collector.

12.    Business Justifications. As you can see, doing RNGs is hard! Rule
#1 -- use what the platform provides. You shouldn't be doing this. About
the only rationales for doing your own RNG are the following.
        a. Your application has something to do with money or journalism
or anti-government protest or is a CVP. By money, we mean Bitcoin or
other forms of hard digital cash, not online banking. The most common
CVP or centralised vulnerability party (aka TTP or trusted third party)
is the Certification Authority.
        b. Your operating platform is likely to be attacked by a
persistent and aggressive attacker. This might be true if the platform
is one of the following: any big American or government controlled
software, Microsoft Windows, Java (code, not applets), any mobile phone
OS, COTS routers/firewalls, virtual machines (VMs).
        c. You write your own application software, your own libraries
*and* your own crypto!
        d. You can show objectively that you can do a better job.
    Note that it is still a hard test, you want ALL of those to be true
before you start mucking around in this chaotic area.

That all said, good luck! Comments to the normal place, please, and Ed's
note: this will improve in time.