[Cryptography] Ada vs Rust vs safer C

Tue Sep 20 06:51:43 EDT 2016

>> This was based on a very expressive annotation language and a fairly powerful
>> proof engine, so with proper annotations you could verify some quite
>> meaningful properties of C (or other language) programs.  Unfortunately, the
>> annotation language was highly mathematical in form, well beyond the comfort
>> zone (or even competence) of the vast majority of programmers.  So the effort
>> never had any significant real-world impact.
> 
> There have been several projects like this, e.g. Larch and LCLint/splint,
> which required near-incomprehensible annotations and produced near-
> incomprehensible diagnostics.  It requires experts to use, and another set of
> experts to evaluate the results.
Yup.  That was it exactly.

> The problem with all of these is that the designers chose to create a
> metalanguage/system for specifying the precise semantics of a program, which
> means you need to create a second copy of your code using mathematical
> notation, and the proof assistant then checks whether your mathematical
> description matches what your code does....
This is a criticism I've made myself.  While valid in some cases, I think one can overstate it.  An annotation system - or even a specification system like Z - inherently provides some redundant description of the code.  The questions to ask are:  How much of the code has to be redundantly described?  Is the redundant description simpler, significantly shorter, easier to understand and reason about than the original code?

Consider that type systems are themselves to a large degree redundant descriptions of the code - even in a language with so simple a type system as C.  Yes, some of the type information goes into memory allocation and disambiguation of overloaded operations - whether "+" is integer or floating addition.  But some of it goes into making the program easier to understand.  Consider that very early versions of C allowed you to write I.x where I was an integer variable and there happened to be a struct {float x; float y;} in scope.  The meaning was:  Implicitly cast I to a pointer to such a struct and pull out the x reference.  This would only be an error if there were two struct's in scope with x fields whose types or offsets were different.  If this seems odd ... keep in mind that C struct was inspired by an assembler concept of a (dummy) section, known as a DSECT at least in IBM360 assembler parlance.  All you got from the assembler construct was an offset, represented a bit more cleanly than a bunch of constant declarations.

In any case, the preference for static over dynamic typing *in program correctness contexts* is precisely that the programmer has to specify the assumed properties of certain names.  Yes, dynamically typed languages check this at run-time - but understanding what they are doing is harder.  Yes, there are now pretty powerful type derivation systems out there that let you elide the types in many cases.  But even in languages with such derivation systems, people often add "unnecessary" types anyway - because a human being wants to know what the code is supposed to mean without having to first derive all the types.

In a language like C, the expressivity of types is rather limited.  Much of the type system is based on primitive machine types, some of whose semantics aren't even fully specified (2's complement arithmetic in the presence of overflows).  But even there, well-written C code will use a variety of struct types and pointers to them to make the intended semantics of objects clearer.  In C++, which tightens down the free and easy conversion of pointer types into each other, this can be used to even better effect.  Same for enum's versus a set of #define's of integer constants.

One can take this further - and languages have.  Consider subrange types in Ada, for example.  How far is too far is difficult to pin down.  You know when you've gotten there.  You also know when the language you've chosen is inappropriate to its intended purpose:  Making the code easier to understand.  The original Larch work was probably reasonable, at least within the rather sophisticated community in which it was developed.  But then the curse of generalization set in, and what were a few clear pre-defined concepts became a whole bunch of complex mathematics.

>> This is the side of annotations and program verification systems that's
>> proved a stumbling block pretty much forever.  No one has really solved the
>> problem of working either with un-annotated code in languages that are in
>> actual use; or designing an annotation system that real-world programmers can
>> use effectively.
> 
> PREfast does a pretty good job.  What made the PREfast work unique is that
> they didn't start with some abstract mathematical goal (well, not once they
> got the tool into production use beyond the research stage) but looked at what
> real-world programmers were doing and created what was needed to deal with
> that.  It's evolved continuously over the years to match real-world
> experience, so the current SAL and PREfast is very different from what it was
> five and ten years ago.
I'll have to have a look at it.
                                                        -- Jerry