[Cryptography] Various CPU cache side channels

Mon Mar 26 23:32:34 EDT 2018

On Mon, Mar 26, 2018 at 3:01 PM, Henry Baker <hbaker1 at pipeline.com> wrote:
> I've done a lot of reading re Spectre, and it would
> appear that Spectre-like attacks could be made on
> *any/every* cache found in every modern processor,
> including:

There is a bit of hope.
See Page Attribution Tables and IOMMU
https://www.kernel.org/doc/Documentation/x86/pat.txt
It is possible to map memory and bypass the cache system on most
modern processors.
The performance stinks but secrets can be outside the view of the cache system.

> enforcing secrecy classifications inside of a
> processor is very complex.
Yes...

<commets>
Other older system hardware needs to be looked at too.
Consider the AMD hypertransport bus.
https://en.wikipedia.org/wiki/HyperTransport
A mother board can have N processors each with a DRAM controller interconnected
with this HT bus and each processor can access RAM in its own and the
other processors
all viewed as a single 40 bit address space.
There is also the SGI NUMA Link (Dan Lenoski and others at
Stanford/SGI) again with routed data links
to all the memory and I/O in the system.
Transputers and that interconnect  notion might be reviewed in the
world of a sea of processors none with a
TLB or cache.   Purpose wiring them as functional blocks in a
problem's solution may avoid these attacks.
Tear the blocks apart and reload and rewire them for the next task.

As long as the TLB blocks or stalls correctly an invalid memory read
by speculation should be data safe.
Timing safe is never a given ...  Knowing the key length alone sorts
out the easier to attack devices.

The R8000 system at SGI was tightly coupled to RAM with the processor
memory interface
fully pipelined out to DRAM.  There was no R8000+1 because all the
chips would have to be
respun and updated for an new memory subsystem.   This was an
interesting way to run through code
in a pipelined vector style.
The sysad bus on other MIPS/SGI processors was a bottleneck never addressed.

Bigger and more flexible TLB systems with variable page sizes might
help.  Page flipping for networking
is one example.

The Z80 had a clever solution for a context switch but no protection.

The big solution problem space is system design.   To correctly solve
these problems and also go fast there needs to be
a massive system rebuild from UART chips to system buses for memory,
I/O and display.  GPUs and the CUDA model
needs looking at for the good and bad bits.   Solid state storage
memory and a good I/O DMA strategy could roll code
in an out of harms way much like SCOPE and old  mainframe operating
systems.   This notion can solve service cloud
problems that do not need a *nix system.   Perhaps seL4 style.

The SOC folk do not get a free ride.   The SOC designs are blobs of
hardware description code often under NDA hidden behind another NDA.
The invisibility of the internals of these blobs hides another risk
class of problems.   Some are under NDA because hardware
designs are full of something borrowed and something new.   The
borrowed bit is a liability and the new an asset.
Enumeration of all the parts and credits to a design is a tangle.
Open source hardware is not handy because of cost but the programmable
parts are
getting faster and bigger.

-- 
  T o m    M i t c h e l l