[Cryptography] Speculation considered harmful?

Tue Jan 16 20:59:01 EST 2018

John Gilmore <gnu at toad.com> writes:

> Why are we discussing random failed computer architectures like VLIW on
> a cryptography mailing list?

Good question.

> Fixing the Meltdown & Spectre attacks doesn't require tearing down all
> of computing and starting over -- no matter how much the cranks and
> losers of the architecture wars would like it to.
>
> It just requires fixing a few implementation bugs (like always stop
> speculating when you get a memory fault).

True for Meltdown; not so true for Spectre.

The core of the problem is not really speculative execution... The core
of the problem is implicit CPU state leading to covert
channels. Speculative execution just made the attacks more real, like
allowing a proof-of-concept JavaScript program to read arbitrary memory
in your browser process.

Meltdown works like this:

Step 1) Access mapped but inaccessible memory in a speculative execution path

Step 2) Use timing to extract information about the contents of the cache

Spectre works like this:

Step 1) Poison the branch prediction buffers

Step 2) Convince privileged code (kernel, JavaScript runtime, etc.) to
execute an indirect jump, speculatively executing code of your choice

Step 3) Use timing to extract information about the contents of the cache

In both cases, the speculative execution leaves "footprints" in the
cache that can be detected via timing, which opens a huge gaping covert
channel across protection domains. For Meltdown, it is just an Intel bug
that speculative accesses can bypass protections. For Spectre,
it's... Well, something weirder than just a "bug". And it affects nearly
evey high-performance CPU.

Compiler writers are arranging to emit code avoiding all indirect jumps
(try a search for "retpoline"). Intel is adding instructions to flush
the branch prediction buffers. This is a bit beyond "fixing a few
implementation bugs", and all of it just to shut the door on Spectre.

But here is the problem: Even if you eliminate speculative execution
entirely, the cache still holds "footprints" of the execution of your
privileged code. And it is hard to prove exactly what information that
conveys (or does not convey).

There are two kinds of security. One is where you say "I do not see how
an attacker can do X". The other is where you say "I can prove the
attacker cannot do X, assuming Y and Z". The former leaves you
vulnerable to people smarter and/or more motivated than you. The latter
is what you want.

Cache timing attacks, given the implicit management of the cache by the
CPU hardware, means that I do not know exactly what information I am
leaking from privileged code *no matter how I write that code*.

And it is not just the cache. Consider performance counters, debug
registers, register renaming (i.e. vastly more physical registers than
architectural registers), etc. All of this implicitly-managed state
might carry who-knows-what information across protection domains.

So, yeah, plugging Meltdown is just a bug fix. Plugging Spectre is a
large but tractable task.

So, congratulations, you stopped the proof-of-concept code from
working. But being able to prove that nothing interesting passes across
protection domains, given all those megabytes of implicitly managed
state. That is going to require some serious rethinking of CPU
architecture. Just for starters, not sharing cache between privileged
and unprivileged code...

 - Nemo
   https://self-evident.org/