[Cryptography] RNG design principles

Thu Nov 24 19:23:31 EST 2016

In the context of:

>>> I doubt many people on this list would say, "Golly, here are
>>> three different ciphers we found lying around.  We don't trust
>>> any of them, so let's just XOR them together. That should be
>>> hunky dory."

On 11/24/2016 09:01 AM, Theodore Ts'o wrote in part:

> but we *would* consider nesting them with different keys.

Not all of us, Kemosabe.  I would not do that, or recommend
it, or give it any kind of favorable consideration.  The maxim 
   "garbage in, garbage out"
has been around since the dawn of computing.  As a corollary:
   "garbage cubed in, garbage out"

Note the contrast:
 ++ Combining RNG seed sources makes sense /provided/ there is
  a trustworthy lower bound on the unpredictability of each,
  and source failures (if any) are known to be uncorrelated.
 -- It is not good practice to rely on a combination of sources
  without such guarantees.

Let's be clear:  It's not the idea of combination that is
objectionable.  It is the reliance (with or without combination)
on sources with no trustworthy good properties.

If we are talking about design principles, this seems like a
rather fundamental point.

On 11/23/2016 08:26 PM, Ron Garret wrote:

>> my baseline recommendation if you want to be exceptionally
>> paranoid is to make an audio recording of some white-ish noise
>> (e.g. record yourself saying “Shhh”) and then extract 1% or 0.1% of
>> the result. Of course, you have to do this in a secure environment.
>> An attacker is vastly more likely to compromise you by obtaining a
>> copy of this recording than because it didn’t contain enough
>> entropy.

That makes a good baseline, a good starting point.  It should work
in quite a few (but not all) situations.  As part of a discussion
of concepts and principles, it is a step in the right direction.

Constructive suggestions:

1) Add a validation step:  After recording the signal, play it back.
 If it doesn't sound like hiss, try again until it does.  Rationale:
 It is all too easy to screw up the mixer settings etc. the first
 time you make a recording.  Also there are lots of broken microphones
 out there.

2) Set the sample rate to "CD quality" (44,100 frames per second)
 or higher.

3) Record at least 1 second of data.

Analysis: If it sounds like hiss, it probably contains noise
components out to 4 kHz or higher ... possibly considerably
higher, but we don't need higher.  Also it probably contains
at least 4 bits per sample of raw unpredictability.  That
can be quantified in terms of adamance if you want to be
scientific about it.  If we derate that to 1 bit at 1 kHz,
that still leaves us with 1000 bits of unpredictability, which
should be plenty for seeding a well-designed CSPRNG.

Feed the recorded .wav file "as is" into your CSPRNG.  There
is no need to reformat the data.

One limitation of this method is that it is not suitable for
unattended operation, or for "kiosk" situations where the user
cannot be trusted to cooperate.

In terms of bang for the buck, you can improve things coming
and going, as follows:  Rely on the physics of Johnson noise
(instead of the physics of fluid dynamic turbulence in your
mouth).  On a typical computer you get *more* unpredictability
with the audio ADC open-circuited than you do with it connected
to a microphone.  Seriously, folks:
  A) Nyquist's theory says that Johnson noise power is proportional
   to the impedance, and the microphone lowers the impedance.
  B) I've done the experiment.  Nyquist and Johnson were right.
    http://journals.aps.org/pr/abstract/10.1103/PhysRev.32.97
    http://journals.aps.org/pr/abstract/10.1103/PhysRev.32.110

It takes more cleverness to validate the input in this case,
because the guaranteed noise (guaranteed by the laws of
thermodynamics) might be buried under other stuff.  The other
stuff is harmless from the RNG's point of view, but it means
you can't validate the signal just by listening.  I have some
fancy tools to help with the validation, but you can get by
with prosaic tools e.g. sox + od + gnumeric.  Look at the FFT.

  It must be emphasized that the work mentioned in the previous
  paragraph needs to be done only once for each make&model of
  machine ... not once per instance, and certainly not once
  every time you need some randomness.  So it gets amortized
  pretty quickly ... unlike the audible noise approach, which
  requires work every time you need some randomness.

Getting rid of the microphone enlarges the number of customers
who can benefit, because although phones and laptops typically
have built-in microphones, desktop-class machines typically
have audio circuits but no built-in microphone.  In the latter
case, the open-circuit solution is cheaper and easier (as well
as better), since you don't need to procure a microphone.

For server-farm-class boards with no onboard audio at all,
you can buy USB audio dongles for a few dollars.  Anybody who
is not willing to do this is not serious about security.

For virtual machines, there are right ways and wrong ways
to proceed.

For IoT-class things that don't have any good way to obtain
randomness, again there are right ways and wrong ways to
proceed.  In particular, the claim that the same RNG software
has to run on all platforms, no matter what, is not reasonable.

Taking the next step down that road, we must face the fact that
there some devices out there that are not secure and cannot
be made secure.  These devices should not be used as a pretext
for giving up on the other 99% of the hardware, or for derailing
the conversation.

Returning to the original claim that Ron Garret put forth
on 11/22/2016 01:03 PM:

> Everything that matters about randomness can be summarized in four 
> bullet points

I really don't think so.  If you want to see what a RNG looks
like when designed by cryptographers, take a look at:
  Elaine Barker and John Kelsey,
  “Recommendation for Random Number Generation Using Deterministic Random Bit Generators”
  http://csrc.nist.gov/publications/nistpubs/800-90A/SP800-90A.pdf

It's complicated ... even when using a cryptologically strong
hash function as a building block.  Every bit of that complexity
is there for a reason.

> I am constantly surprised by how often discussions of randomness
> arise on this list, and how long they continue.

If at first you don't see the reasons, keep looking until you do.
Barker and Kelsey are not idiots.

And that's just a high-level architecture.  A real implementation
needs to deal with lots of grotty details, which we can discuss
if anybody's interested.  A list of three or four pithy axioms
tends to wildly underestimate the amount of effort that is needed.

=================

Very specific constructive suggestion:

If somebody would like to do something that would help in a
big way, figure out how to pass a seed via the kernel command
line at boot time ... and then keep it secret thereafter.

Keeping it secret is harder than it might sound, because
command line processing differs from platform to platform,
and because the kernel copies the command line all over the
place and munges it in various ways.

  In general, when something was built without security in
  mind, trying to go back and add security is a big hassle.

This is however worth the trouble, because it is a systematic
way of solving a number of problems having to do with the
need for random numbers early in the boot process.
 -- Grub can pass a seed on the command line.
 -- A sufficiently meticulous user can roll some dice and
  pass a seed on the command line.
 -- A VM host can pass a seed on each guest's command line.
 -- et cetera.

We can discuss this in more detail if anybody is interested.

Bottom line:  It is much better to have a set T of trustworthy
sources, rather than a steaming pile U of untrustworthy sources,
even if the cardinality #T is much less than #U.