[Cryptography] how reliably do audits spot backdoors?

Tue Dec 31 07:48:54 EST 2013

On Dec 29, 2013, at 11:31 AM, andrew cooke wrote:
> Go is better - http://golang.org/ref/spec#Integer_overflow - for unsigned
> values (recent code from Google / Adam Langley seems to use Go).
> 
> Julia is similar -
> http://docs.julialang.org/en/release-0.2/manual/integers-and-floating-point-numbers/#overflow-behavior
These specs are "better" in the sense that they require specific behavior rather than giving implementations choices.  That's good from the point of view of program analysis and proving, but there are two other things to consider:

1.  The effect on program performance.  Java originally specified the exact behavior of FP arithmetic.  Any two implementations of the original Java specs were guaranteed to get the bit-for-bit identical results in any FP computation  The behavior exactly matched the SPARC implementation of FP, but to get the same results on the x86's of the day required immense overhead (you pretty much had to save each intermediate result to memory and then read it back into a register).  Ironically, the Java-specified results were *less* precise than the x86 could achieve.  (This isn't new:  There was a patch to the microcode of some model of VAX - I think the 750, the second VAX built - which *reduced the precision of certain FP instructions* to match the architectural definitions.)

In the case of Java, the costs were so high - and the payoff, as far as those who used FP seriously, i.e., the experienced numerical analysts, so non-existent - that the Java spec was changed to *allow* more variation.

2.  Whether the fully defined behavior is a *good* behavior from a human point of view.  Most programmers, most of the time, think of computer integers as models for mathematical integers.  Signed arithmetic with at least 32 bits is a good match for this intuition except for very large or very small values, so people rarely get in trouble with it.  Unsigned arithmetic, on the other hand, fails to match for values near 0, exactly where many common computations "live".  Such an "obvious" fact as that

	u - 1 < u

is false when u is 0 - which makes it all too easy to write non-terminating loops - or at least loops that won't terminate until x wraps all the way around.  (Been there, done that, sigh.)

Also, the C idea that unsigned is a *type*, rather than unsigned operations being alternative operations on the bit patterns of a pre-existing signed type, while nice in many ways, combines badly with default arithmetic promotions.  You have to consider what happens when signed and unsigned values meet "across an operator".  This used to be ambiguous.  It no longer is - the spec is now explicit that "unsigned wins" - but it's still all too easy to write expressions whose meaning is not at all what a human being would expect from a quick glance.

C++ actually enshrines the problem in its libraries.  Lengths of strings are unsigned, but string::npos - the constant used to represent "not a legal string offset" - is -1.  Kind of - it's really the bit pattern for -1 converted into an unsigned.  People do pretty much understand that you must not use "< 0" to test whether you got a a string::npos - and compilers these days will typically warn you if they see an u < 0, telling you it's always false - did you really mean that?  But that's only part of the story.

If you ask for the first (or last) position at which a string occurs within another string, you get an offset (unsigned, since it could be as large as one less than the length of the string) into the string, or string::npos if it doesn't occur at all.  So to check that the first occurrence of s1 precedes the first occurrence of s2, if any, you can do:

	s.index(s1) < s.index(s2)

But if you try:

	s.rindex(s1) > s.rindex(s2)

to ask the inverted question about the *last* occurrence - does the last occurrence of s1 occur after the last occurrence of s2, if any - you get the wrong answer when s2 doesn't occur in the string.  (You can construct many similar examples, and I've caught such failures in code reviews.)

Google C++ standards actually forbid the use of unsigned values for anything other than bit masks or similar purposes - it's just too tricky for ordinary humans to get consistently right, and the original purpose - to double the range of counts and object sizes and such when int was 16 bits - is long obsolete.  No pun intended :-)
                                                        -- Jerry