[Cryptography] 33C3: cash :-) attacks !

Mon Jan 9 19:56:40 EST 2017

At 04:29 PM 1/9/2017, Kevin W. Wall wrote:
>On Sun, Jan 8, 2017 at 2:27 PM, Henry Baker <hbaker1 at pipeline.com> wrote:
>> FYI --
>>
>> https://media.ccc.de/v/33c3-8044-what_could_possibly_go_wrong_with_insert_x86_instruction_here
>>
>> https://cdn.media.ccc.de/congress/2016/h264-hd/33c3-8044-eng-What_could_possibly_go_wrong_with_insert_x86_instruction_here.mp4
>>
>> (55 mins; 327 MBytes)
>>
>> "What could possibly go wrong with <insert x86 instruction here>?
>> Side effects include side-channel attacks and bypassing kernel ASLR"
>>
>> ClÃ©mentine Maurice and Moritz Lipp
>
>So I just finished watching this.  My initial thought is that between
>this, rowhammer, FBI Rule 41, and NSLs, we are all pretty much screwed
>as TLAs and nation states are pretty much always going to be able to
>do this.
>
>It still seems like a pretty esoteric attack that is unlikely to be most
>attacker's _first_ choice, but as I don't see any simple mitigation for this--
>short of disabling cache, which no one is likely to do except in very
>rare cases--it seems like this is always going to be available as a last
>resort and AV is not going to be able to detect it.
>
>So what, if anything, do we do?  Timing attacks are rare IRL, but
>usually that's because there's almost always some easier way in.  Since
>I work in appsec, I'm always more interested in what we can do to
>manage the risk.  Any ideas?

Some HW ideas:

There used to be chips for "real-time" applications, in which a
high-priority process could *lock down* a portion of the cache for its
*exclusive* use.

At some cost in performance and/or chip area, the operating system could
allocate private caches for different processes.

Multiprocessor architectures used to be moving towards "shared nothing"
memory hierarchies, so that they could maximize the number of distinct
HW processors and minimize the contention.

But then the reliance on C-language and "threads" made shared memory
more attractive.

It may be time to move away from threads & shared memory and back to
shared-nothing architectures, now that we have another strong incentive
(minimizing timing attacks) to do so.

WRT the short term, we have to attack "compressed cycles" with
strategies similar to those that encryption-after-compression
has done to minimize timing (due to cycle compression).

There may be more exotic SW techniques more akin to oblivious
RAM (ORAM), but notice that ORAM's & caches don't mix very
well (except to the extent that the cache can speed up the
ORAM shuffling.

I'm had an idea at the back of my mind for a long time which
generalizes the ORAM idea by fitting the access statistics of
your particular program into the access statistics of my
program.  In other words, you hide the access statistics of
your program by spoofing the access statistics of another
(my) program.  Yes, performance is degraded to the extent
that the masking statistics aren't ideal for executing your
program, but that seems to be inevitable.

Here's an analogy: a city has a scheduled transportation
system involving buses and subways of given capacity.  A
covert attacker wants to move his people into various
places in the city, but without raising any suspicion.
If they were to bring their own cars, they would disrupt
the traffic in a noticeable way.  However, if they all
used the existing transportation system and stayed within
the capacity constraints, there would be no effect on
overall traffic.  Of course, it is essential that the
timing of the buses not be affected by whether the bus
is empty or full, else the bus will change its schedule.

But this is a good analogy with a computer's datapaths,
which have predetermined timings and capacities, and
most operations in modern clocked logic are not affected
by the detailed content of the data on which they
operate.