[Cryptography] Implementing constant-time string comparison

Tue Jun 17 23:04:35 EDT 2014

On Jun 17, 2014, at 6:07 PM, Steve Weis <steveweis at gmail.com> wrote:
>> static int vn(const u8 *x,const u8 *y,int n)
>> {
>>  u32 i,d = 0;
>>  for (i = 0,i < n,++i) d |= x[i]^y[i];
>>  return (1 & ((d - 1) >> 8)) - 1;
>> }
>> 
>> Here the expression in the return statement evaluates to (d == 0)-1
>> in a way that makes it impossible for the CPU to produce a timing
>> side-channel by short-circuiting a comparison of (half)words.
> 
> I had not considered the (d == 0) comparison as potential timing leak,
> but think there is probably a good reason NaCl is using it.
> 
> I can see it would be a problem if there weren't a comparison
> instruction of sufficient width, or if the compiler/interpreter did
> not use it, or if that instruction itself might not be constant time.
> My guess is that the first case is the issue motivating this. I'll
> find out though.
If you're going to go this far, you'd better understand everything about the code generator.  Peephole optimizers have long been capable of determining the actual semantics of the short sequences of machine code that expressions like this generate and then finding "optimal" (according a different set of measures than motivated this code) if sometimes quite surprising equivalent sequences.  For example, C and C++ lack explicit notations for bit rotation, but their compilers have long turned expressions like (x << 4) | ((x >> 28) & 0xF) into a "rotate left 4 four bits" instruction.  (This one is probably a built-in sequence - and compilers can be curiously sensitive to minor nuances in the way the expression is written - but more general optimizers have been written and fielded in the past.)

The technology to recognize that the loop computing the XOR can be done "better" using comparisons and a quick exit when a difference is found certainly exists, though it probably isn't used in fielded compilers (because the cost/benefit ratio would be too small).

Even C is really too high-level to allow you to be sure of the instructions compilers will emit.  If you want instruction-level control, you still need to use an assembler - and, yes, it's non-portable.  How can "instruction-level control" be anything but non-portable?  (And of course you'd better understand how the instructions themselves are implemented in the hardware.  Successful attacks have been mounted against various machine-level caches.)

                                                        -- Jerry