[Cryptography] What's a Plausible Attack On Random Number Generation?

Thu Oct 31 11:09:38 EDT 2013

It's become very difficult to know where sound engineering practice ends and tin-hat paranoia begins.  I'd like to propose one way to get at what reasonable boundaries are:  I'll lay out a particular, I hope realistic, proposal, and then ask the group to play attacker.  After all, the best way to understand what's needed for defense is to try to attack.  To that end:

I am the owner of a data center.  I have some hundreds of physical machines, with old ones being decommissioned and new ones being added periodically.  These run thousands of VM's that are created and destroyed much more frequently.  The machines are interconnected by a number of high-speed (say 10GigE or better) networks, further partitioned for security purposes into multiple VLAN's.  None of these networks are visible outside the boundaries of my data center.  Connection to the Internet is via bastion routers.  There's a DMZ, all the usual stuff.  I have excellent physical security, and I regularly "patrol the perimeter", as well as the internal physical plant, so I assume that any attempt to run additional network links into my data center, plant significant hardware, etc., will be noticed fairly quickly.  I'm actually paranoid enough that I've spec'ed conductive mesh throughout the outer structure, so the data center is effectively inside a Faraday cage (i.e., no magic radio connections to the outside), and of course I check this periodically as well.  As a result of all this, I choose not to consider physical threats; nor, because ultimately there's nothing I can do about them, on-going sophisticated insider attacks.

I use a very simple random number generator:  There's a "random pool" of a few thousand bits.  At first boot, it's initially populated using whatever I can get my hands on principally the MAC address, any serial numbers available, etc.  At shutdown, the state is stored locally; at restart, it's restored to the previous state.  The first boot logic will wait for a significant period of time to gather entropy as I'll describe in a moment before allowing any processes that use random numbers to proceed.  (Reboot may also wait a bit, but not nearly as long.)

There is *exactly two* sources of entropy.  The primary one is watching the network:  Each time a network packet arrives, mix into my pool the full contents of the packet, the value of the real-time clock, and the value of the CPU cycle counter.  The secondary source is other machines in the datacenter:  Each machine will periodically get from some fraction of other machines scattered through the network some number of random bits, delivered in an encrypted packet using a key agreed upon between the two (based on their own randomness), and also mix that in.  The mixing function is like that in the Linux RNG - "good enough but fast enough" - but not cryptographically secure as that's too expensive.  To make up for that, periodically (say every millisecond or so) I also scramble the pool using a cryptographically secure 1-way hash.

Whenever I've delivered more than (arbitrary number) 1/3 of the bits in the pool, I won't deliver any more until I've (a) done a cryptographically secure hash; (b) received at least N network packets; (c) received at least K updates from other machines.  If network packets aren't arriving fast enough, I'll start listening promiscuously.  If that doesn't give me enough, I can start sending "please send me a random update" requests to nodes I know about, which generates both those updates, and traffic.  (During the first boot the "wait long enough" logic actually isn't based on time but on number of packets received.)

OK, this is the classic setup in which all my apparently-random data comes from a network that *could* be observed.  The question is:  Can you mount a *plausible* attack?  Keep in mind that (a) there's no way you could send all the data, along with the timing information, from a fast intra-datacenter link out to a WAN without getting noticed.  Hell, you can't even send it *over the intra-datacenter link itself* without being noticed, as you'd be more than doubling the traffic - *and your own traffic would then also be input to the RNG"; (b) Simply *recording* it all does you no good, as you still need a way to exfiltrate it - and the data quantities would be absurdly large.  (As part of my physical security, I make sure that no disks or SSD's ever leave the facility without being physically destroyed - SOP at all good-quality data centers today.)  (c) So it seems you have to analyze the data on the spot - get software running on some machines to maintain models of the state of others.  But that doesn't seem very plausible either - the modeler will have to run continuously on a great deal of data.  How much of the aggregate memory, compute power, and network I/O can you grab without being noticed?  Further, the exchange of randomness among machines means that you can't just focus on any one machine:  In effect, there is a pool consisting of *all* the machines.  If you grab the state of one, it'll soon be mixed with the state of several others, which in turn are mixed with the state of several others, and so on.

Note that attacks which start off assuming you have root on all the machines are out of bounds:  If you have that, you can do much easier things, like take over the RNG call and return data of your own choosing.  Nothing in the RNG design can protect against this, so while it's a plausible attack, it's one that will have to be dealt with in some other way.

So - there's my "data center with secure random number generators".  Let the attacks begin.
                                                        -- Jerry