AES cache timing attack

Brian Gladman brg at
Fri Jun 17 16:03:05 EDT 2005

Hal Finney wrote:

>> Steven M. Bellovin writes:
>>>>Dan Bernstein has a new cache timing attack on AES:
>> This is a pretty alarming attack.  Bernstein was actually able to recover
>> the AES key using a somewhat artificial server which reported its own
>> timing to do an AES operation.  In principle the same attack would be
>> possible on a typical remote server, using a larger number of requests
>> to average out timing variations.

This is a very nice piece of work by Bernstein but I am not convinced
about the practical significance of the attack.

And I certainly don't see any reason to abandon some of the design
approaches (e.g table lookup) as he has been suggesting just because
they are exploitable in some situations. In many situations they are not
exploitable at all and in those situations where they might cause
problems it is up to system designers to decide whether or not they need
to deploy countermeasures.

>> He also had some critical things to say about the AES selection
>> process, which apparently dropped the ball on this issue.  Other ciphers
>> got dinged for nonconstant execution time, but no one thought that cache
>> misses would be that significant.

>> Dan has more information at, including
>> graphs showing the variability in timings for various
>> implementations at and
>>, and his own attempt at a (nearly)
>> constant-time AES implementation as part of his poly1305-aes MAC
>> It would be interesting to see how the speed of his AES implementation
>> compares to that of other widely used versions like Brian Gladman's.
>> How much speed penalty do you pay to get the constant speed?  As Dan
>> notes, you can easily make a slow AES that runs at constant speed, but
>> a fast one is far more difficult.

Nevertheless Bernstein has shown up one issue that I had not been
conscious of and this is that on modern (Intel) x86 systems there is no
longer a significant speed penalty for unaligned memory accesses to
32-bit words, a feature that allows AES to be implemented with very much
less table space than is normally the case.

There is almost no speed penalty in terms of best speed and the typical
speed is likely to be a lot better in most practical situations because
the load on the cache is greatly reduced.  And the timing variability of
this code is greatly reduced so its an all round win on the x86.

The downside is that, although unaligned accesses on x86 are ok, on many
other architectures these cause exceptions and this makes it tedious to
build compressed table operation into portable C code. In fact it is so
tedious that I am not going to offer this and have instead simply
published x86 assembler code which I report on here:

For those who can live with x86 only, and with an assembler
implementation, this code matches the maximum speed of my large table
assembler version on the P3 and P4.

Another issue that this raises is that of the crypto API since in those
situations where the timing attack matters it is necessary to control
the position of the expanded AES key on the stack and this requires that
key expansion and encryption is done as one integrated API call, aka:

   encrypt(key[], in[], out[], no_of_blocks)

I hope this helps but if not I will try and answer any other questions.

   Brian Gladman

The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to majordomo at

More information about the cryptography mailing list