[Cryptography] GCC bug 30475 (was Re: bounded pointers in C)

Thu May 1 06:33:47 EDT 2014

On Apr 30, 2014, at 5:01 PM, Stephan Neuhaus <stephan.neuhaus at tik.ee.ethz.ch> wrote:
>> Imagine an architecture  where the compiler that does its 32 bit
>> integer maths in 64 bit registers and only worries about whether the
>> result fits into a 32 bit int when assigning the result to a variable
>> or passing it as a parameter.  Your pet assert would fail with such
>> an implementation without an optimiser in sight.  I believe such an
>> implementation would still be correct according to the standard as it
>> should produce the same result in all cases where the output is not
>> left as undefined by the standard.
> 
> While I disagree with most of what you said about what the compiler
> ought and ought not to do with security checks and undefined behaviour,
> there is precedent of a kind for this.
> 
> Some CPU/FPU implementations have 80 bit FPU registers (equivalent to
> C's long double type when on an IEEE-conforming implementation), but
> 64-bit double representations.  Computations are done by loading the
> (64-bit) double operands form memory, extending them to (80 bit) long
> double, doing the computation in long double, then rounding the result
> to double and finally storing the result. This can invalidate many
> carefully crafted optimisations or error-curtailing tricks.
"Some" implementations?  This is the original x86 FPU semantics, the ones present on every 32-bit x86 CPU ever built.  I haven't had any reason to keep up, and I don't know if the most recent FP instruction sets still do the same thing; I suspect they might very well.

This behavior caused incredible tumult in the early days of Java.  Java defined FP arithmetic "to the bit level", insisting that every implementation produce exactly the same bit-level results as the reference implementation on SPARC hardware - independently definable in terms of the IEEE standard arithmetic for float and double.  x86's used in a reasonable way produced "wrong" results - they're "too accurate" in some cases, and can avoid overflow and produce an exact result in other cases where an implementation that sticks to float and double cannot.  The best way anyone ever found to avoid that was to store results into memory after every FP operation, then load them back.  Needless to say, this completely killed performance.  Eventually - under pressure from those who actually understood the numerics - the Java spec was modified to allow "more accurate" results.  (You can still get the original semantics if you use the strictfp modifier.)

Another example of the same phenomenon is "fused multiply and add" - a single three-operand instruction that computes A*B+C.  (This is an extremely common operation in many FP programs - it's the basic computation step of an inner product AKA a projection of a vector on another, and hence also of a matrix multiplication).  One can implement this with both better accuracy and increased performance than can be achieved by doing a multiply and then an add.  IBM's Power processors have this instruction, and their compilers recognize when it can be used.

If you get into questions of parallelism and memory models, all modern machines have the potential for unexpected behavior.  One classic example:

Initially both x and y are 0
Thread 1	Thread 2
x = 1;          print x;
y = 1;          print y;

What can thread 2 print?  There are three obvious possibilities:

0 0     - Thread 2 completes before thread 1
1 1     - Thread 1 completes before thread 2
1 0     - Thread 1 executes its first statement, then thread 2 completes

But 0 1 is impossible.  Except ... due to everything from compiler optimizations (OK, not in *this* trivial example, but in more complex examples that come down to the same pattern) to memory buffering effects, it *is* possible!

We long, long ago got to the point where performance and "obvious" semantics parted ways.  C deliberately makes the results of certain operations undefined since otherwise, as with the bit-by-bit equivalence of FP operations in early Java, its compilers would be forced to produce unacceptably bad performance in certain real-life situations.  That's just the way things are, and they are not likely to change in the foreseeable future.

As I've said before, what's wrong with C is *not* that certain operations have undefined results, and *not* that compilers do things you don't expect if you run into those corner cases (though I will agree that some actions, like GCC's, violate the old C notion of "acting as someone with knowledge of the machine architecture would expect").  What's really missing is a standard way to detect some important conditions, like overflow.  Every architecture *can* do this, even if on some it might be more expensive than on others.  What's needed is a standardized API like:

sum_overflows(a,b)  - true if computing a+b overflows
diff_overflows(a,b) ...

There might even be an optional (presence checkable by a macro)

expr_overflows(expression)

These would come from a standard header, but could be implemented in all kinds of ways, from simple range tests when used with compilers that know they won't optimize them away to something involving pragmas to disable inappropriate optimizations for the computation to built-ins.  It's actually a bit of a mystery to me why this has never made it into C - it's and obvious thing to have, matches the C model of providing direct access to machine capabilities, and is probably fairly easy on implementers:  "Dumb" compiler implementations will leave the standard test expressions alone so won't have to do much work; "smart" optimizers should have enough mechanism to add this stuff cheaply.

                                                        -- Jerry