Optimisation Considered Harmful

Thu Jun 23 07:36:38 EDT 2005

| A brief altercation this evening with CERT over the recent hyperthread caching
| issues has brought something that's been simmering at the back of my brain to
| the forefront.
| 
| The recent hyperthread/cache key recovery trick, followed by DJB's related
| (IMO) symmetric key recovery, and preceded by the RSA timing attacks (Boneh et
| al?) are all examples of attacks on the same thing: optimisation.
| 
| The problem is that if different paths through your code expose different
| availability of optimisation, then there's a timing attack available. I
| predict, for example, that someone will find a way to attack something using
| pipeline breaks on Pentium processors[1].
|
| How do we fix this? Well, its easy to say: we make everything equally crap -
| we make sure that all operations are as slow as the slowest possible variant
| can be. However, on a modern processor, this is _very_ hard to do.
| 
| Could it be that for safe crypto we should be using Z80s?
This is an excellent point, as it reveals a deep and significant parallel
between cryptography and compiler/hardware optimization.

Consider what it means to create an optimizing compiler (or some kind of opti- 
mization in hardware - the issues are the same, but I'll talk in terms of a 
compiler for definiteness.)  The input is source code; the output is a bunch 
of instructions.  A compiler's transformation is correct if it's semantics- 
preserving:  The source has some meaning, and the object code correctly 
represents that meaning.  There are an infinite number of possible object 
codes that preserve the input semantics.  Some are "better" than others with 
respect to some objective function, say size or speed.  An optimizing compiler 
simply chooses a "better" object code than some reference choice.

The big issue in compiler optimization - and even more so in some hardware
optimization - is defining exactly what semantics has to be preserved.  For
example, must computations be done even if the compiler can determine that
they cannot affect the output?  Can the rules of algebra be used to rearrange
expressions (possibly breaking carefully crafted error management strategies,
since floating point arithmetic doesn't actually obey the rules of algebra)?
Do writes to two variables written in order in the source code have to occur
in that order in the object code?  If two writes are issued in order to the
hardware, do they have to hit memory in that order?

An understanding of what semantics has to be preserved, and what semantics is
"side-effect" and can be tossed to gain performance, has gradually emerged in
the hardware and software communities.  There have been, and continue to be,
missteps along the way.  Some of the weaker memory consistency models might
have helped the hardware guys, but were just too hard for the software guys
to deal with.  Newbies to multi-threaded programming are caught by the not-so-
obvious memory access semantics present even at the language level in all
common programming languages.  (Java tried to pin this down exactly and got it
completely wrong for several tries.)

Enter cryptographic algorithms.  On their face, these are simple mathematical
transformations.  You have to really abuse the implementations (e.g., having
multiple side-effect-producing operations in one statement) to get into area
where a programmer or hardware developer's warning bell would sound "watch the
semantics".  And, in fact, from the point of view of input/output transforma-
tions, there really are no semantic issues.  The problem is that these side-
channel attacks broaden "the meaning of the program" to something that has
never been considered in previous work that I know of.  (The closest you are
likely to find is in complaints by real-time programmers that modern machines
give you no way to determine how long an instruction sequence will really
take:  You might take a page fault, or a cache miss, or who knows what along
the way, and in some real-time code, you have to *know* that that won't
happen.  Such code really is sometimes run only on machines like Z80's!

What can be done?  Well, the first thing that's clearly needed is a more
complete specification of the semantics of cryptographic algorithms.  Just
the input/output transformation - which is all we write down in most analyses
- is insufficient.  We sometimes state things informally, almost in passing -
as in the comments on AES that "table accesses take constant time".  We
certainly assume things like "the time to add two numbers is independent of
the number of carries" - which is probably true on machines today, but may
actually have been false at one time.  Without a way to write down what
matters, we have no way to judge whether a particular compilation/hardware
approach is safe.  There's some work on abstract program interpretation that
might help here (though it's mainly aimed in other directions).

Ultimately, performance is likely to suffer.  Software and hardware optimiza-
tions are essential to get good performance today, and they are all done under
the assumption that it's the *average* case that matters:  Increasing the
variance (up to a certain - fairly generous - point) is fine as long as you
drop the mean.  All performance benchmarking is on loops that are repeated
thousands of times.  I can't recall ever seeing a benchmark that reported the
variance of timing across instances of the loop.

There are only a couple of roads forward:

	- Develop algorithms that offer reasonable performance even if
		implemented in "unoptimized" ways.  This will be difficult
		to maintain in the face of ever-increasing hardware optimiza-
		tions that you can't just turn off by "not using -O".

	- Live with less performance and hope that raw hardware speeds will
		catch up.

	- Use specialized hardware, designed not to leak side-channel
		information.

	- ?

							-- Jerry

---------------------------------------------------------------------
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to majordomo at metzdowd.com