[Cryptography] GCC bug 30475 (was Re: bounded pointers in C)

Thu May 1 17:36:41 EDT 2014

On Apr 30, 2014, at 8:59 PM, Benjamin Kreuter <brk7bx at virginia.edu> wrote:

> On Wed, 2014-04-30 at 08:41 -0400, Arnold Reinhold wrote:
>> On Apr 29, 2014, at 11:21 PM, Benjamin Kreuter <brk7bx at virginia.edu> wrote:
>> 
>>> On Tue, 2014-04-29 at 19:56 -0400, Arnold Reinhold wrote:
>>>> The C standards committee, as far as I know, only said that signed
>>>> integer overflow was undefined. They never required that code which
>>>> could create such an overflow, and any code dependent on such an
>>>> operation, be removed.
>>> 
>>> Maybe so, but it is also not prohibited to remove it.  
>> 
>> As I pointed out there are a lot of evil thing the compiler could do
>> with undefined code that are not prohibited by the C standards. That
>> does not mean it is ok to do them. There are other standards for human
>> behavior besides those written by programming language committees.
> 
> Right, and my point is that it is not evil for a compiler to optimize
> code in ways that could cause a program to behave differently when
> undefined behavior is triggered.

After the developers have been told that a specific optimization is potentially causing widespread security and safety problems, I think it is evil. 

> 
>>> I would want this
>>> to be optimized by my compiler:
>>> 
>>> if(x > x + 1) { /* ... */ }
>>> 
>>> Yet someone might have been using "x > x + 1" as a way to check "x ==
>>> INT_MAX", and such an optimization would cause a problem for them.
>> 
>> Why would you want the compiler to silently remove that code? If it
>> wasn't an overflow check, maybe you made a typo--when you wrote that
>> line you had something in mind.
> 
> I might not have written that line; that line might have been a result
> of other optimizations.  Here is a simple example:
> 
> #include <stdio.h>
> 
> void foo(int x, int y) {
>  if (x > y) {
>    printf("Greater\n");
>  } else {
>    printf("Lesser\n");
>  }
> }
> 
> int main() {
>  int x;
>  scanf("%d\n", &x);
>  foo(x, x+1);
>  return 0;
> }
> 
> Copy propagation (a textbook optimization) would cause "if (x > x+1)" to
> appear.  If the next step is algebraic simplification (also textbook),
> then the behavior of this code when x is INT_MAX will change.  In fact,
> on my system GCC goes even further -- it replaces the call to "foo" with
> a call to "puts:" ...

Same issue applies. If you know what foo does, why would you write "foo(x, x+1);" ? If you weren't checking for overflow, it makes no sense. Maybe you really meant to write "foo(x, y+1);" The compiler should flag this, not optimize it away. If you were running a code review that included this fragment and the programmer wasn't present, would you simply replace foo(x, x+1); with printf("Lesser\n"); or would you get hold of the programmer and ask him what the heck were you thinking?

> 
> 
>> That horse has left the barn long ago. Thousands of security sensitive
>> programs written in C are in use today.
> 
> Yeah, and that is a *problem* that we should be trying to solve.

And I'm trying to in my own feeble to address it. Which course of action makes more sense, rewriting all the security sensitive software written in C or getting the compiler to stop removing signed integer overflow checks? 

> 
>> Isn't it more reasonable to expect the developers of C to respond to
>> the safety needs of their users and the public?
> 
> The way C programmers do that is by not writing programs with undefined
> behavior.  Writing programs with undefined behavior is writing programs
> with bugs.

If the undefined behavior is a bug, the compiler should treat it as a compile time error, not allow it during ordinary compiles and silently optimize it away later. 

> 
>> Well, to begin with Bug 30475 was about an assert statement and these
>> are pretty much always safety checks.
> 
> assert is a way for programmers to find bugs, not a substitute for
> actually performing safety checks.  Using assert instead of an explicit
> check means that you think an explicit check is not needed -- that your
> program would run fine without it.  There is a standard way to globally
> disable asserts at compile time without touching the code at all (define
> NDEBUG) and it is not at all uncommon to do so before releasing a
> binary.

I pointed out that asserts may not be appropriate in production code, but if they are only there for debug, then there is even less of a case for optimizing them out. We don't need maximum execution speed during testing and they might catch other problems caused by misunderstandings between the programmer and the optimizer. Assert style checks are also useful in hardening programs against deliberate attacks, say where the attacker figures out a way to create an overflow with malformed inputs that should have been checked but weren't.  If anything, we need programmers to put more sanity checks in their programs, not put hurdles in their way. 

> 
>> An optimizer is supposed to generate code that produces the same output
>> as the original program, but more efficiently.
> 
> That is only true when the program's behavior is well-defined.  If the
> behavior is undefined there are no guarantees at all.  Even a minor
> change to the code generation strategy (e.g. what order function
> arguments will be evaluated in) might affect undefined behavior.  There
> is no reason to believe that the optimizer will not change undefined
> behavior.
> 
> Something important to keep in mind is that undefined behavior can be
> affected by changes to other sections of code, even if those changes do
> not affect any well-defined behavior.  For example:
> 
> char c[] = "abcdefghij";
> int x, y, z;
> x = 45;
> y = 10;
> z = x + y;
> printf("%s %d\n", c+20, z);
> return 0;
> 
> Note that we could change this so that we have "z = 55" without
> intermediate variables, but the program's behavior changes on my system
> even with optimization disabled (actually, it changes *only* when
> optimization is disabled):
> 
> char c[] = "abcdefghij";
> int z = 55;
> printf("%s %d\n", c+20, z);
> return 0;
> 
> That kind of basic change to a program is commonly performed by
> optimizers.  Would you suggest that compilers should avoid this, or that
> programmers should be warned about changes optimization stages make to
> stack frames?

OK, you got me here. "...produces the same output as the original program" overstates my intent. A simpler example (in lazy pseudocode) would be

starttime=currenttime();
BigUglyFunction();
print (currenttime() - starttime);

Obviously using the optimizer should produce a different output here.  

> 
>> If a behavior is undefined, how can the optimizer conclude that the
>> output will be the same? I would think undefined behavior is the last
>> thing I would want an optimizer to touch. 
> 
> Again, look at the above examples.  Basic textbook optimizations would
> be crippled if the optimizer could not change undefined behavior.  I am
> not sure that anything beyond peephole optimization would even be
> possible.
> 
>>> 
>>> If optimizers were truly *forbidden* from relying on anything that might
>>> be undefined behavior in C, almost no good optimization would be
>>> possible.  It is reasonable and necessary for the optimizer to leave it
>>> up to the programmer to prevent undefined behavior.
>> 
>> Do you have nay evidence for this claim? How many statements with
>> undefined behavior do you think are in a typical program? How many
>> nanoseconds would we waste if the compiler just left them alone?
> 
> Note that I said "anything that *might* be undefined behavior," which is
> exceedingly common in C programs.  Integer arithmetic can trigger
> undefined behavior.  Accessing arrays, dereferencing pointers, passing
> arguments to functions, etc. all have the potential to trigger undefined
> behavior.  Again, look at the examples above for evidence.

Does what you said above "Writing programs with undefined behavior is writing programs with bugs" apply equally to "accessing arrays, dereferencing pointers, passing arguments to functions, etc."?

In your example:

char c[] = "abcdefghij";
int z = 55;
printf("%s %d\n", c+20, z);
return 0;

Would your optimizer be justified in silently deleting everything but the return, since whatever c+20 points to is clearly undefined? Can it remove all pointer dereferences whenever it can't verify that proper bounds checks have taken place?  The "undefined behavior" rubric is getting a little squishy. Maybe we should just focus on signed integer overflow.

> 
> You keep referring to optimizers as saving "nanoseconds" as if we
> should all believe that optimizers are just toys.  Modern optimizers do
> more than shave off a few nanoseconds.  In the general case optimizers
> are better at producing fast code than human beings.  Optimizers can and
> do analyze code across entire programs, finding simplifications and
> eliminating unnecessary code.  The difference can be very big in both
> speed and program size, and it can be measured in dollars saved on
> equipment, electricity, cooling, etc.
> 

If your bank account gets raided, will you be comforted in the knowledge that the bank got by with fewer servers? Again let's stick to signed integer overflow. How much of this performance gain would be lost if signed integer overflow operations and their sequela were not optimized away? 

Arnold Reinhold