[Cryptography] real random numbers

Sat Sep 14 23:11:07 EDT 2013

Previously I said we need to speak more carefully about these
things.  Let me start by taking my own advice:

Alas on 09/14/2013 12:29 PM, I wrote:
> a) In the linux "random" device, /any/ user can mix stuff into the
> driver's pool.  This is a non-privileged operation.  The idea is that
> it can't hurt and it might help.  So far so good. b) Contributions of
> the type just mentioned do *not* increase the driver's estimate of
> the entropy in the pool.  If you want to increase the
> entropy-estimate, you need to issue a privileged ioctl.

> ... step (a) cannot get anybody into trouble.  Step (b) gets you into
> trouble if you claim credit for more entropy than was actually
> contributed.

Actually it's one step more complicated than that.  Step (a) 
causes problems if you /underestimate/ the entropy content of
what you contributed.  The problem is that the end-user
application will try to read from the RNG and will stall
due to insufficient entropy available.

Step (b) has the opposite problem: You get into trouble if 
you /overestimate/ the entropy of what you have contributed.
This causes insidious security problems, because your allegedly 
random numbers are not as random as you think.

On 09/14/2013 03:12 PM, John Kelsey wrote:

> Your first two categories are talking about the distribution of 
> entropy--we assume some unpredictability exists, and we want to 
> quantify it in terms of bits of entropy per bit of output.  That's a 
> useful distinction to make, and as you said, if you can get even a 
> little entropy per bit and know how much you're getting, you can get 
> something very close to ideal random bits out.
> 
> Your second two categories are talking about different kinds of 
> sources--completely deterministic, or things that can have
> randomness but don't always.  That leaves out sources that always
> have a particular amount of entropy (or at least are always expected
> to!).

That very much depends on what you mean by "expected".
 -- An ill-founded expectation is little more than a wild guess,
  and it is not useful for critical applications.
 ++ OTOH a well-founded statistical "expectation value" is just
  what we need, and it moves the source firmly out of the
  squish category.

I say again, a squish is not reliably predictable /and/ not
reliably unpredictable.  If you have *any* trustworthy nonzero
lower bound on the entropy content, it's not a squish.

On the other hand, again and again people latch onto something
that is not reliably predictable, call it "random", and try
to do something with it without establishing any such lower
bound.  This has led to disaster again and again.

There is a ocean of difference between "not reliably predictable"
and "reliably unpredictable".

> I'd say even the "squish" category can be useful in two way
> 
> a.  If you have sensible mechanisms for collecting entropy, they 
> can't hurt and sometimes help.  For example, if you sample an 
> external clock, most of the time, the answer may be deterministic, 
> but once in awhile, you may get some actual entropy, in the sense 
> that the clock drift is sufficient that the sampled value could have 
> one of two values, and an attacker can't know which.

However, alas, the good guys don't know how much either, so they 
don't know much to take credit for.  An underestimate causes the 
RNG to stall, and an overestimate means the output is not as random
as it should be.  I vehemently recommend against risking either of
these failures.

I emphasize that there are two operations that must be considered
carefully:  
 1) Mixing "stuff" into the driver's pool, and
 2) taking credit for it ... the right amount of credit.

One without the other is strictly amateur hour.

> b.  If you sample enough squishes, you may accumulate a lot of 
> entropy.

You might, or you might not.  In an adversarial situation, this
is begging for trouble.  I vehemently recommend against this.

> Some ring oscillator designs are built like this, hoping to 
> occasionally sample the transition in value on one of the 
> oscillators.

Hope is not an algorithm.

> The idea is that the rest of the behavior of the oscillators might
> possibly be predicted by an attacker, but what value gets read when
> you sample a value that's transitioning between a 0 and a 1 is really
> random, changed by thermal noise.

So quantify the thermal noise already.  It sounds like you are
using the oscillator as a crude digitizer, digitizing the thermal
noise, which is the first step in the right direction.  The next 
step to come up with a hard lower bound on the entropy density.

OTOH when you plug in the actual numbers, you will probably find 
that the oscillator is incredibly inefficient compared to a 
soundcard.

My main point is, there is a perfectly reasonable formalism for 
analyzing these things, so that "hope" is not required.

Secondarily, there is a huge industry mass-producing soundcards
at a very low price.  Very often, a soundcard is build into the
mainboard, whether you ask for it or not.  So in practical terms, 
any sort of custom thing you think of has a big hurdle to overcome.

=========================================

On 09/14/2013 02:38 PM, Kent Borg wrote:

> (How much Tempest-hardening are you going to do?)

I've run the numbers carefully enough to know that emanations
from the audio system are almost certainly negligible compared
to a gazillion other things going on in the system.  A few
kilohertz versus a few gigahertz.  Audio cables are routinely
shielded.

If you think I'm wrong about this, please spell out a realistic
attack scenario.  Please be as specific as possible, with RF
field levels and such.

> You dismiss "things like clock skew", but when I start to imagine
> ways to defeat interrupt timing as an entropy source, your Johnson
> noise source also fails: by the time the adversary has enough
> information about what is going on inside the GHz-plus box to infer
> precise clock phase, precise interrupt timing, and how fast the CPU
> responds...they have also tapped into the code that is counting your
> Johnson.

That's not how the attack works.  The attack on clock-skew sources
starts by considering the possibility that the two clocks are
correlated ... so that the skew is producing vastly less entropy
than you might have hoped.  The idea of clocks being correlated
falls into the "duh" category;  it's a big part of why clocks were 
invented, originally.

This stands in dramatic contrast to the Johnson noise.  The physics
says the raw data is uncorrelated.  The processing could introduce
a little bit of correlation, but only at the cost of reducing the
bandwidth and/or introducing gross audio distortions.  Crucially,
you can measure the bandwidth!  This is part of the calibration 
that turbid does.

If you have a principled, reliable, non-zero lower bound on the 
entropy density of a stream of clock-skew measurements, please 
share it with us.

> And once we have built such vaguely secure systems, why reject
> entropy sources within those systems, merely because they you think
> they look like "squish"?  If there is a random component, why toss it
> out? 

Again:  If you have a reliable, nonzero lower bound on the entropy
density, it's not a squish!  Please, let's try to settle on some
clear terminology and stick with it. 

Conversely, you don't have a reliable nonzero lower bound, then 
you are at risk for producing allegedly random numbers that aren't 
as random as they should be.  The historical record is replete with 
examples of people who fell into this trap and paid dearly.

> You seem to respect using hashing to condition and stretch
> entropy--though why any existing hash shouldn't also fall to your
> "squish" generalization, I don't know. 

That's based on a diametrical misunderstanding of what I was 
talking about when I mentioned hashing ... so let me clarify:
The concentration process I was talking about is the opposite 
of stretching.  It takes a large number of symbols with a low 
entropy density and uses a hash to produce a smaller number 
of symbols with a high entropy density -- very nearly 100% 
if that's what you want.

Stretching also has its place for some applications, but it
is just the opposite of concentration.  It uses a smallish
supply of truly random numbers to seed a *pseudo* random
generator.  Note that without a truly random seed, any PRNG 
is a dead duck.  Again, the list of people who have screwed 
this up is verrry long.

A stretcher is bundled with turbid, but the stretcher and
the concentrator are wildly different.  They run in different
threads, and are only very loosely coupled.