[Cryptography] Faithful emulation preserves timing resistance?

Mon Jan 8 16:38:09 EST 2018

I'd be interested if anyone has done work relating
to this before.  Thx in advance.

--

While the concept of *emulation* is a core fundamental
principle in computer science, and Turing Machine
emulations of other Turing Machines provides the
basis of much of complexity theory and security
proofs, these same theorists have given short shrift
to the possibility of "real-time" emulation, or at
least "constant slow-down" emulation, whereby one
machine emulates another at a constant percentage
-- e.g., 25% -- of its original speed.

When "cloud computing" embraced *virtual machines*,
which allowed a single "server" machine to emulate
a multiplicity of virtual machines, little thought
was given to any attempt at "real-time" or "percentage
real-time" emulations.

Over the past several decades, however, it has
become quite clear that security and privacy of
emulated machines (and their hosts) can be attacked
via *timing side-channels*, so we must finally
address these issues in our cloud server farms.

---

How to compose a timing-resistant server from
timing-resistant target machine emulators.

As is typical in computer science emulation proofs,
we have a basic step and an induction step, which
allows a *tower* of emulated machines to be constructed
whose top element obeys some principle.

We will beg off any solution to the basic step, and
assume that we are given a computer system which
*already* resists timing side-channel attacks.

We assume that a computer system has been built
in such a way that we can't tell by timing this
system anything about what data bits it is working
on.

Our job is now to handle the *inductive step* and
show how -- given such a timing resistant design --
we can emulate it with another computer in such a
way that its timing resistance is conserved.

We now want to *virtualize* this timing-resistant
computer system: i.e., run it as a *virtual machine*
within the confines of a machine which has a larger
memory and considerably more computational
capabilities.

In order to be able to iteratively *compose*
these virtualizations into an emulation "tower",
we must *preserve* the timing resistance of the
original computer system program in our emulation.

It should be possible to do this, but it isn't
going to be easy.

There is a large (mostly older) literature on
so-called "real time systems", in which a large
number of "real time processes" must execute
on different schedules.

https://en.wikipedia.org/wiki/Real-time_computing

For example, one process must wake up every
100msec to process a new bit coming in on a
serial line (I warned you that this literature
was old!).  This is an example of a "hard" real
time constraint, because the consequence of
not waking up and processing the bit when it
arrives is catastrophic: the bit will be lost
forever.

A VM could conceivably execute a process with
"faithful timing emulation" by choosing a virtual
clock cycle which is long enough so that *every*
possible machine cycle of the emulated machine
can be emulated in a single virtual clock cycle,
and further that the clock cycle is chosen long
enough so that even the relative *jitter* from
the real-time scheduling algorithm will not
leak information.

Note that if we're concerned *only* with the
*timing side-channel* and not other side-
channels -- e.g., a *power* or *radiation*
side-channel -- then we don't necessarily
have to emulate every precise detail of the
target computer, but we do have to be able
to precisely calculate the *time duration*
that any/every operation the target computer
can take.  We then must be able to emulate
every such operation with a constant speedup
(or slowdown), for an a priori fixed constant.

[It might be helpful in the development of such
"faithful" emulators if the computer being
emulated has additional circuitry and registers
which compute the actual *duration* of each
operation.  This circuitry -- together with a
high precision clock timer -- can be used to
*check* on the quality and precision of the
duration-calculation algorithm.  If the
computer hardware constantly checks the
duration-calculation circuitry against the
real-time clock and finds a discrepancy,
then the computer manufacturer needs to be
alerted to modify its duration-calculation
algorithm.

A similar check can be made in the emulation
algorithm: if/when the timing of an emulated
step is not faithful to the original machine,
then the emulation must *fault* and stop.]

The problem with attempting to run multiple
such faithful VM's on a single larger processor
is that we need to leave enough time for all
such VM's to meet their timing schedules, and
under certain conditions, the maximum processor
utilization will be at most ln(2) (i.e., 69%).
Of course this calculation presumes that each
emulated cycle takes exactly the same execution
time, which will almost never be true, so the
actual processor utilization may be substantially
lower.

Note that this particular scheduling has the
advantage that *each emulated virtual machine
cycle runs to completion* before moving on to
the next VM.  I.e., one emulated machine cycle
does NOT INTERRUPT ("pre-empt") the execution
of another emulated machine cycle.  This property
may have significant benefits in terms of reducing
other side-channel attacks, as well as achieving
overall greater emulation efficiency.

http://web.cs.wpi.edu/~cs3013/a09/Papers/Liu%20%26%20Layland%2C%20Real-time%20Scheduling.pdf

"Scheduling Algorithms for Multiprogramming in a Hard-Real-Time Environment",
Liu & Layland, JACM 1973

As Liu & Layland point out, if we allow one
process to be "pre-empted" by another process
through so-called "deadline scheduling", we
can achieve 100% utilization, although it
isn't clear whether the costs of interrupting
one VM to service another will overwhelm the
advantage of higher processor utilization.
Once again, Liu & Layland's result assumes
that emulating each possible machine cycle
can be done in a constant time; to the extent
that these machine cycles vary, the utilization
may have to be reduced.

---

Although implementing "fixed-constant-time
emulation" will considerably complicate the
problem of implementing "virtual machines"
and "containers", such complexity may be
required in order to disable any timing
side-channels.