[Cryptography] RFC possible changes for Linux random device

Sun Sep 14 14:06:35 EDT 2014

On Sat, 2014-09-13 at 21:03 -0400, Theodore Ts'o wrote:

> Funny thing; when many academics write papers criticize the Linux
> random driver, it's often because they don't think we *still* don't do
> a good enough job by their lights.
> 
> And then there is the camp who believes that once you gather 256 bits
> or so of "true randomness", all you need to do is just crank a DRBG,
> and the rest is all just mummery, and I get mocked/criticized/told I'm
> stupid by that crowd too.
> 
> Which is OK.  Over time, you learn how to have a thick skin when
> people deliver mutually conflicting criticisms.  :-)

My 20 millibucks says it's important for cryptography that no 
sequence of observed outputs should allow anyone to predict 
future outputs.  I consider that if they can directly observe 
the state of the RNG or its power requirements or instruction 
counts, they have hardware access, which makes them at best an
uncommon threat relative to the ubiquitous problem of network 
access.

That said, I find myself using the cryptographic RNG for non-
cryptographic purposes sometimes, specifically because the small 
state of the PRNGs provided by most systems means I cannot get 
a sequence of uncorrelated outputs more than a few dozen bits 
long out of them.  256 bits of state limits the uncorrelated 
sequence to just four int64's, which is nowhere near enough 
even for some of the simulation and statistics type applications. 

> I'll also add that improving boot-time entropy is the hard part; and
> it has nothing to do with changing the crypto.  

In my very strong opinion, a requirement for boot-time entropy can 
result only from bad design.  Systems that need boot time entropy 
can need it only because they are doing things at boot time which 
should not be done at boot time, and failure to correct this OS 
design failure is actively harmful to security. 

The issue is that if you are building a dependence on secure
communications into the boot sequence, that means that you are 
depending on external systems and their state of security.  If 
you build a need for external systems that deeply into the OS, 
you are exposing the booting machine to compromises and failures 
of both those external systems and the communications network. 

Fully boot a purely local operating system before you attempt 
to open any external communications, and open those external 
communications in process spaces where the kernel can monitor
and/or sandbox them.  In short, make sure your local operating 
system is fully instantiated and can protect itself from anything 
that can come over external communications. 

And for this reason my own personal opinion is that providing 
the people who promote this network-before-fully-booted failure 
of design with the early-boot entropy they desire contributes 
to a problem, in that it allows them to skate along for one or 
two or ten more years without correcting that fundamental 
failure, or even continuing in blithe denial of the idea that 
it needs to be corrected.

On the other hand failing to provide the early-boot entropy they
desire, in any context where they cannot be dissuaded from that 
failure of design, will result in yet worse breakage. So this 
really is kind of between a rock and a hard place. 

> P.S.  Oh, and if anyone can get the ARM architecture folks to specify
> a cycle counter which is guaranteed to be there, or at the very least,
> won't crash the SOC if you try to use it and it's not there --- which
> is why Linux doesn't try to use the cycle counter at all on most ARM
> platforms, even though it's often present (it's just that it's not
> guaranteed by the ARM architecture, and if you guess wrong, the
> results are catastrophic) --- that would be awfully nice.  

This seems like the sort of thing to test in very early boot.  
Wake up, check a variable ARM-COUNTER from the disk, find it equal 
to zero, write one to it, flush the disk buffer, attempt to 
continue using the the counter.  If you don't crash, write a 
zero back to ARM-COUNTER and remember (in some other bit) that 
the counter is valid.  If you do crash, you wake up again in 
a forced reboot, but this time you find ARM-COUNTER equal to 
one - so you write a zero to it, remember (in your other bit) 
that the counter is invalid, and never ever use the counter.  
The result is that boxes where the instruction counter is 
unimplemented have a slightly longer boot sequence, containing 
one early crash-and-reboot.  It's ugly as sin, but I used to do 
the same thing boot-testing for math coprocessors on 8088 boxes. 

			Bear