[Cryptography] letter versus spirit of the law ... UB delenda est

Jerry Leichter leichter at lrw.com
Sun Oct 25 08:41:43 EDT 2015


>> Unfortunately, that description covers only a tiny fraction of users of C
>> today.
> 
> I would say it's actually miniscule, not just tiny.  Some data points:
> 
> * Measurements of deployed code
> 
> The SOSP'13 paper found that a minimum of 40% of all Debian packages have UB
> issues (the figure could be much higher since they may not detect everything
> that's there)....
> Only problem is that even with security experts creating the code, they still
> trigger UB:
> 
>  http://blog.regehr.org/archives/593
> 
> If even experienced security conscious programmers can't write code that's
> safe from gcc's breakage, how is J.Random programmer supposed to do it?
Way back when, when pcc was the only C compiler, "portable C" meant "compiles with pcc".  Since the "portable" in pcc extended to very few machines and exactly one operating system, it also meant "runs on PDP-11's" and maybe a few other machines under Unix.  This was the era when it was "portable" to assume that *(char*)0 == 0 because "all machines" mapped the bottom of memory and had a 0 word as the first word of memory.  (Yes, programs were really written on that assumption.  It allowed you to do things like pass NULL when you wanted to pass an empty string:  Code would look for the '\0' and find it right there at location 0.  This habit has cursed C and even C++ ever since.)

Later we went through the "all the world's a VAX" era, when "portable" code was anything that compiled and ran on a VAX.  This was followed by the "all the world's a Sun" (initially 68K, later SPARC).  When (quite a bit later) SunOS had been renamed Solaris and ported to x86 (I'm talking about the original port, done some time in the 90's, years before Sun started building x86-based hardware), code sometimes required work to move from Solaris SPARC to Solaris x86, even though they were nominally "the same".

The first C Standard really served to write down minimal assumptions about C compilers.  It was common knowledge among C programmers that you couldn't write any non-toy code entirely in the language defined by the Standard.  You referred to the Standard as a base; you referred to the documentation of your compiler and OS to get real work done.

I've been pretty much out of the C programming world for, hmm, close to two decades now.  I'll still write a small C tool now and then, and I have a bunch of older tools that I modify as new needs arise, but my heavy coding has alternated between C++ and Java.  I haven't even looked at the more recent versions of the C standards.  It seems there's a feeling out there now that C is supposed to give you a way to write truly portable code, and that the way to get there is to write to the C Standard.

It never was this way, and it was never intended to be this way.  C was a low-level language, intended to get you close to the hardware.  The original definition of "int" - I don't know if the current Standard retains it - was something along the lines of "a signed integer that has the best performance on the targeted hardware".

*If* you can manage to stay away from the undefined edges of data representations (which is where most "undefined" behavior lives), you can actually write useful code that's portable to all reasonable implementations of the language.  Back in the day when you ran code in your own name on your own data, this was a reasonable assumption.  In an era where every piece of code seems to get exposed to the Internet eventually, and where the data you work on is handed to you by attackers, things are not nearly as simple.  Note that developers of packages like SafeInt are *deliberately venturing out into the dangerous areas*:  You don't really need such libraries if you're writing code for 1990's usage model.  You need it when not only is someone out to make your program fail, they are out to make it fail in a way that lets them attack you.

Back in the early 1990's, I taught a compiler writing class.  I implemented a simple Modula-2 compiler with parts left out; students had to find the (deliberately introduced) bugs (extra points for finding bugs I didn't know about!) and fill in parts I hadn't done.  The compiler was written in C.  I would work on it on my VMS VAX at home, bring it to campus and compile and run it on my SunOS (Solaris?) Sparc workstation, then give it to students who would often compile and run it on their MSDOS x86 desktops.  It took discipline, but, yes, I could and did write code that was portable and correct across these very different environments.

Would it have been safe against code deliberately trying to attack it?  Almost certainly not.  That was simply not an issue, and I certainly didn't even consider the possibility.  (I still have the code around somewhere.  Peter, if you want I can dig it out and see how it holds up against modern analysis tools.)

Anyway ... as an old-world C programmer, I find the discovery that you can't, based on the C Standard, write completely portable code *even in a non-adversarial environment* kind of amusing.  My main response is - so what else is new?

C is what it is.  It promises certain things, which it delivers pretty well.  It doesn't promise others.  Don't be upset that it doesn't deliver them.

Now, as to what an alternative might be ... that's a very, very interesting question. The most portable language I know of also dates to Bell Labs in the mid-1960's:  SNOBOL4.  SNOBOL4 was implemented in two levels:  There's SIL (SNOBOL4 Implementation Language), a pseudo-assembler for a memory-to-memory fake machine, originally implemented as a set of assembler macros on a particular target; and the compiler/interpreter, written entirely in SIL.  Really, really the same whether back in the day you ran the original IBM360 implementation or the CDC6000 port; or today, when there's a version based on implementing the SIL operations in C.  But ... after all these generations of progress, *still* slow.  (That doesn't mean it isn't useful. There have also been other, more efficient, implementations - all but one highly specific to particular machines and OS's.)

Languages can have multiple desirable attributes:  Portability, safety, capable of extreme performance, capable of getting "down to the hardware" for OS-like functions, ease of analysis (formal or otherwise), tight control of memory allocation, expressiveness (whatever that means to you - it's almost certainly not exactly what it means to me) ... you can add to the list.  There is not, and almost certainly never will be, a single language that can cover all the bases.

In the case of a language like C, the definition in the Standard is *still* not enough to write secure code.  (Whether, today, it's enough to write *useful* code, I don't know.)  The combination of C *with a good compiler and run time environment*, on the other hand, *can* be used to write secure code - though it remains a very hard job, because (by design!) C leaves most of the necessary work (array/pointer bounds checks, overflow checks, and memory management being the largest portion) to the programmer.

What this discussion makes clear is that C plus gcc is not a suitable environment for this kind of code.

If you think the issue is C - suggest a good alternative.  I still have a sweet spot in my heart for Modula-3, but it's long dead.  (Since I never actually did any significant programming in Modula-3, my affection may be misplaced anyway.  All kinds of things look great when you read the documentation; day-to-day usage gives you a different viewpoint.)

As an area of research and even commercial development, "safe languages suitable for systems programming" died about the same time Modula-3 did.  After all, we knew what the OS's of the present and future were/would be (Windows and Unix), and we knew what they were/would be written in (C, some C++).

Today, exposure to the Internet has brought the issues of "systems programming" to almost all code anyone writes; but the area still appears moribund.

                                                        -- Jerry



More information about the cryptography mailing list