[Cryptography] [RNG] Use process ID in mixing?

Wed Mar 26 14:06:45 EDT 2014

My post that started this thread went to two lists, this one and an
rng list. I failed to set a Reply-to: header (I have yet to discover
how to do that in Gmail) so various people have replied on each list
without including the other. Here is a message I sent only to the rng
list, answering a comment there.

I suggest future discussion should go to the rng list.

---------- Forwarded message ----------
From: Sandy Harris <sandyinchina at gmail.com>
Date: Tue, Mar 25, 2014 at 1:44 PM
Subject: Re: [RNG] Use process ID in mixing?
To: rng at lists.bitrot.info

 <travis+ml-rng at subspacefield.org> wrote:

> On Tue, Mar 18, 2014 at 03:48:09PM -0400, Sandy Harris wrote:
>> A process ID is only a few bits long and in many cases is quite
>> predictable; it is entirely useless as an entropy source. However, I
>> wonder if it could play a role analogous to salt in a password
>> algorithm or the suggestion of stirring things like MAC addresses into
>> the pool at startup just so every machine does it slightly
>> differently.
>
> Yes.
>
> Note that everything usable to fingerprint a machine can be used to uniquify the instance.
>
> HDD IDs, UUIDs of all kinds, hardware IDs, CPUIDs, you name it.
>
>> On Linux, you can get the caller's pid from kernel code with  #include
>> <linux/sched.h> then look at current->pid. Probably there is something
>> similar for other systems and quite possibly there is other usable
>> data in the struct; I haven't looked.
>
>> Is it worth salting every call to (u)random? Mix the pid into the
>> output or the pool. This can do no harm, but does it do any
>> perceptible good?
>
> Why just PID?  Why not timestamp (with maximum resolution).

The essential requirement is enough real entropy for good
seeding, ideally a hardware RNG plus the stuff from device
interrupts. You also need good initialisation, ideally the
per-device provisioned seed Denker suggests plus the
file of stored output from the last run.

Stirring in timestamps is important, both at boot time to
make every instance different and later. I think the current
Linux code includes time in the code that collects entropy
from interrupts.

Those are the key concerns, but adding other things as
a defense-in-depth method also looks worthwhile. Stir
in more stuff which acts like salt -- just makes instances
different and complicates attacks a bit rather than being
expected to contribute entropy. These are all knowable
or guessable for at least some enemies, but perhaps
not all. They are cheap to add, cannot do harm and
may do some good.

In order of importance, the ones I can see are:

Including MAC addresses or other hardware identifiers
in boot time mixing makes every machine different.

When a user space process reads the device, the
kernel has access to a struct describing the process.
Mix in PID and anything else in the struct that looks
interesting. This depends on system state in ways
that may be quite complex on some systems, and
on different parts of the state than the timing data.

The driver contains a lot of variables. which could
be initialised at compile time. Using data from
the development system's /dev/urandom could
make those different for every compilation. In the
common cases where a single compiled image
gets installed on many devices or put into a
distro, this does not do much good, but at least
it makes every release use somewhat different
device code.