[Cryptography] GCC bug 30475 (was Re: bounded pointers in C)

Benjamin Kreuter brk7bx at virginia.edu
Wed Apr 30 20:59:44 EDT 2014


On Wed, 2014-04-30 at 08:41 -0400, Arnold Reinhold wrote:
> On Apr 29, 2014, at 11:21 PM, Benjamin Kreuter <brk7bx at virginia.edu> wrote:
> 
> > On Tue, 2014-04-29 at 19:56 -0400, Arnold Reinhold wrote:
> >> The C standards committee, as far as I know, only said that signed
> >> integer overflow was undefined. They never required that code which
> >> could create such an overflow, and any code dependent on such an
> >> operation, be removed.
> > 
> > Maybe so, but it is also not prohibited to remove it.  
> 
> As I pointed out there are a lot of evil thing the compiler could do
> with undefined code that are not prohibited by the C standards. That
> does not mean it is ok to do them. There are other standards for human
> behavior besides those written by programming language committees.

Right, and my point is that it is not evil for a compiler to optimize
code in ways that could cause a program to behave differently when
undefined behavior is triggered.

> > I would want this
> > to be optimized by my compiler:
> > 
> > if(x > x + 1) { /* ... */ }
> > 
> > Yet someone might have been using "x > x + 1" as a way to check "x ==
> > INT_MAX", and such an optimization would cause a problem for them.
> 
> Why would you want the compiler to silently remove that code? If it
> wasn't an overflow check, maybe you made a typo--when you wrote that
> line you had something in mind.

I might not have written that line; that line might have been a result
of other optimizations.  Here is a simple example:

#include <stdio.h>

void foo(int x, int y) {
  if (x > y) {
    printf("Greater\n");
  } else {
    printf("Lesser\n");
  }
}

int main() {
  int x;
  scanf("%d\n", &x);
  foo(x, x+1);
  return 0;
}

Copy propagation (a textbook optimization) would cause "if (x > x+1)" to
appear.  If the next step is algebraic simplification (also textbook),
then the behavior of this code when x is INT_MAX will change.  In fact,
on my system GCC goes even further -- it replaces the call to "foo" with
a call to "puts:"

main:
.LFB12:
	.cfi_startproc
	subq	$24, %rsp
	.cfi_def_cfa_offset 32
	movl	$.LC2, %edi
	xorl	%eax, %eax
	leaq	12(%rsp), %rsi
	call	__isoc99_scanf
	movl	$.LC1, %edi
	call	puts
	xorl	%eax, %eax
	addq	$24, %rsp
	.cfi_def_cfa_offset 8
	ret
	.cfi_endproc

> That horse has left the barn long ago. Thousands of security sensitive
> programs written in C are in use today.

Yeah, and that is a *problem* that we should be trying to solve.

> Isn't it more reasonable to expect the developers of C to respond to
> the safety needs of their users and the public?

The way C programmers do that is by not writing programs with undefined
behavior.  Writing programs with undefined behavior is writing programs
with bugs.

> Well, to begin with Bug 30475 was about an assert statement and these
> are pretty much always safety checks.

assert is a way for programmers to find bugs, not a substitute for
actually performing safety checks.  Using assert instead of an explicit
check means that you think an explicit check is not needed -- that your
program would run fine without it.  There is a standard way to globally
disable asserts at compile time without touching the code at all (define
NDEBUG) and it is not at all uncommon to do so before releasing a
binary.

> An optimizer is supposed to generate code that produces the same output
> as the original program, but more efficiently.

That is only true when the program's behavior is well-defined.  If the
behavior is undefined there are no guarantees at all.  Even a minor
change to the code generation strategy (e.g. what order function
arguments will be evaluated in) might affect undefined behavior.  There
is no reason to believe that the optimizer will not change undefined
behavior.

Something important to keep in mind is that undefined behavior can be
affected by changes to other sections of code, even if those changes do
not affect any well-defined behavior.  For example:

char c[] = "abcdefghij";
int x, y, z;
x = 45;
y = 10;
z = x + y;
printf("%s %d\n", c+20, z);
return 0;

Note that we could change this so that we have "z = 55" without
intermediate variables, but the program's behavior changes on my system
even with optimization disabled (actually, it changes *only* when
optimization is disabled):

char c[] = "abcdefghij";
int z = 55;
printf("%s %d\n", c+20, z);
return 0;

That kind of basic change to a program is commonly performed by
optimizers.  Would you suggest that compilers should avoid this, or that
programmers should be warned about changes optimization stages make to
stack frames?

> If a behavior is undefined, how can the optimizer conclude that the
> output will be the same? I would think undefined behavior is the last
> thing I would want an optimizer to touch. 

Again, look at the above examples.  Basic textbook optimizations would
be crippled if the optimizer could not change undefined behavior.  I am
not sure that anything beyond peephole optimization would even be
possible.

> > 
> > If optimizers were truly *forbidden* from relying on anything that might
> > be undefined behavior in C, almost no good optimization would be
> > possible.  It is reasonable and necessary for the optimizer to leave it
> > up to the programmer to prevent undefined behavior.
> 
> Do you have nay evidence for this claim? How many statements with
> undefined behavior do you think are in a typical program? How many
> nanoseconds would we waste if the compiler just left them alone?

Note that I said "anything that *might* be undefined behavior," which is
exceedingly common in C programs.  Integer arithmetic can trigger
undefined behavior.  Accessing arrays, dereferencing pointers, passing
arguments to functions, etc. all have the potential to trigger undefined
behavior.  Again, look at the examples above for evidence.

You keep referring to optimizers as saving "nanoseconds" as if we
should all believe that optimizers are just toys.  Modern optimizers do
more than shave off a few nanoseconds.  In the general case optimizers
are better at producing fast code than human beings.  Optimizers can and
do analyze code across entire programs, finding simplifications and
eliminating unnecessary code.  The difference can be very big in both
speed and program size, and it can be measured in dollars saved on
equipment, electricity, cooling, etc.

-- Ben
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: This is a digitally signed message part
URL: <http://www.metzdowd.com/pipermail/cryptography/attachments/20140430/8b52ae5e/attachment.pgp>


More information about the cryptography mailing list