[Cryptography] SW requirements to block timing side-channel attacks ??

Sun Jul 16 23:22:54 EDT 2017

Ok, suppose we take a cue from the design of some microcodes, which have a "time duration" field to specify how many minor clock cycles this microcode instruction should take to allow all the buses to "settle" before moving on.

We can incorporate such a *duration* parameter in each call to a crypto routine; the crypto routine has to process this parameter *first*, so as not to allow the contents of any other parameter to affect overall timing.  The CPU's real-time clock is read, and the parameter is added to the value of the real-time clock to set an alarm clock to indicate when the crypto routine is to return.  The crypto routine is executed, and a barrier synchronization then waits for the previously computed alarm clock time to arrive before the crypto routine is allowed to return.

Obviously, we have to provide sufficient margin on the real-time clock calculation so that the chances of the crypto process not having been completed is completely negligible -- certainly negligible compared with any networking collision and/or error probability.

(We could -- if necessary -- also calculate a *power budget* for the crypto process; the idea being to always utilize exactly the same power dissipation regardless of the contents of the secret parameters.  The CPU would then needs to have a "shivering" NOP instruction whose only purpose is to quickly burn a precise amount of power.)

But we have other problems: should the crypto process allow interrupts?  I say no, because any interrupts would flush various registers and caches, which might provide other side-channels.  But what if this crypto code is running inside a virtual machine/container?  Ditto.  The virtual machine/container needs to also provide for non-interruptible code; the crypto code might have to provide some "real-time clock logging" to check to see if it is being interrupted; if so, it needs to bail on its calculations and tell its caller that it may have been compromised.

In days of yore, on single-core, single-threaded machines, the idea of running a program with all interrupts disabled would have been anathema.  However, on today's processors with 4-8 or more cores, it's completely reasonable to pull one or more cores out of interrupt-service duty if there is important business to attend to, and crypto processes are just that "more important business".

But what about shared caches?  Here's where the crypto process has to abide by some rules.  It must grab whatever cache lines it needs and lock them down for the duration of its calculations.  If the crypto process can't lock down its resources, then it cannot assume that it will be able to complete its task on time, and any variations in the timing of re-acquiring those resources can (and have) been used as a side-channel to extract secret information.  So the crypto code itself must be content with using *only* resources to which it has acquired (and can test for) exclusive access for the duration of its calculation.

---
If all of these issues sound like those that come up in *real-time* control systems, you're absolutely right!

But that's actually *good* news, since many IoT devices must simultaneously manipulate hundreds/thousands of real-time processes/threads, and their operating systems tend to be already organized to deal with real-time demands.  Just like in real-time systems, where *faster is not necessarily better* (*predictable* timing is better), the same is true of crypto codes, where *constant* timing is better.  So such IoT devices may actually be in a better position to foil timing side-channels than traditional desktop/laptop/cellphone operating systems.