timing attack countermeasures (nonrandom but unpredictable de lays)

leichter_jerrold at emc.com leichter_jerrold at emc.com
Thu Nov 17 11:07:17 EST 2005


| > In many cases, the observed time depends both on the input and on some
| > other random noise.  In such cases, averaging attacks that use the same
| > input over and over again will continue to work, despite the use of
| > a pseudorandom input-dependent delay.  For instance, think of a timing
| > attack on AES, where the time compute the map X |--> AES(K,X) depends
only
| > on K and X, but where the measured time depends on the computation time
| > (which might be a deterministic function of K and X) plus the network
| > latency (which is random).  Indeed, in this example even the computation
| > time might not be a deterministic function of K and X: it might depend
| > on the state of the cache, which might have some random component.
| 
| I don't follow; averaging allows one to remove random variables from
| the overall time, but you're still left with the real computation time
| plus the the deterministic delay I suggested as a function of the
| input.
| 
| Specifically, time t(k,x) = f(k,x) + d(k,x) + r
| 
| Where r is a random variable modelling all random factors, f is the
| time to compute the function, and d is the deterministic delay I
| suggested that is a function of the inputs.  Averaging with repeated
| evaluations of the same k and x allows one to compute the mean value
| of r, and the sum f+d, but I don't see how that helps one seperate f
| from d.  What am I missing?
Why do you need to separate f from f+d?  The attack is based on a timing 
variation that is a function of k and x, that's all.  Think of it this way:
Your implementation with the new d(k,x) added in is indistinguishable, in
externally visible behavior, from a *different* implementation f'(k,x)
which has the undesired property:  That the time is a function of the
inputs.
Any attack that works against such an implementation works against yours.

Now, your model is actually general enough to allow for effective d(k,x)'s.
For example, suppose that d(k,x) = C - f(k,x), for some constant C.  Then
t(k,x) is just C - i.e., the computation is constant-time.

One can generalize this a bit:  f(k,x) in any real application isn't going
to 
have a unique value for every possible (k,x) pair (or even for every
possible 
x for fixed k, or k for fixed x).  Even if this were true in a theoretical 
sense, you couldn't possibly measure it finely enough.  The real attack
arises 
because of a combination of things:  f(k,x) is actually a function or k and
x 
(or can be made so by averaging); the size of f's range is significant 
fraction of the size of the domain of k, x, or (k,x), depending on what you 
are attacking; and, finally, that the inverses images of the elements of f's

range are fairly even in size.  These all arise because the nature of the 
attack is to use f(k,x) to determine that k (or x or (k,x)) is actually a 
member of some subset of the range of k (or ...), namely, the inverse image
of 
the observed value under f.  (The need for the last one can be seen by 
considering a function that sends f(0,x) to x and every other pair of values

to 1.  Then it's easy to attack the 0 key by computing the timing, but no 
information about any other key can be gained by timing attacks.)
  
If we think of your d() function as a "compensation function", then

	d(k,x) = C - f(k,x)

is an "ideal" compensation function, which it may be impractical to use.
(The ideal compensation function is always available *in principle* because
we can set C = max over k,x f(k,x), compute naturally, then "compute d(k,x)"
by looking at the time elapsed for the function we just finished and delay
for C less that value.)  However, the analysis above shows that there may
be other useful compensation functions which, while they can't by their
nature 
provide the degree of security of the ideal compensation function, may
still be effective.  For example, suppose I have several different ways to 
compute the function to be protected, with differing timing characteristics;
but it's certain that for no input values to all the calculations take the 
maximum amount of time.  If I run all the algorithms in parallel and deliver

the first result that is available, I've reduced the range of f by
eliminating 
some of the largest values.  (Of course, one has to get the details right!)

							-- Jerry


---------------------------------------------------------------------
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to majordomo at metzdowd.com



More information about the cryptography mailing list