[Cryptography] letter versus spirit of the law ... Eventus incertus delendus est

Michael Kjörling michael at kjorling.se
Fri Oct 30 06:38:03 EDT 2015


On 30 Oct 2015 03:44 +0000, from pgut001 at cs.auckland.ac.nz (Peter Gutmann):
>> Bugs should be found and fixed. However, "crashing" is not the right word.
>> Crashing is not "generally" good policy.
> 
> Exactly.  There have been cases where the compiler/language policy was to
> crash on numeric overflow, resulting in a rocket that cost $7 billion to
> produce (with the rocket itself costing around $100M) exploding on launch.
> Saying that destroying a $100M rocket is "not good policy" is an
> understatement.

This appears to me to oversimplify what actually happened during the
Ariane 5 launch failure in 1996.

According to [1], the main cause of the failure seems to have been a
combination of trying to return one type of data where another was
expected (an error code where flight data was expected), and the two
redundant units running exactly the same software and thus having the
exact same problem of returning one type of data where another was
expected.

The error that caused an error code to be returned, in turn, could
only happen because the physical properties of the Ariane 5 and its
trajectory were different from those of the Ariane 4, but the software
running on its on-board computers had not been adjusted accordingly.

In other words, the software that ultimately was the trigger for the
breakup worked as intended (but obviously not as required): a problem
was detected where certain values were not within the expected ranges,
because the range of valid values had not been adjusted to match the
changes in hardware. This led to an error condition being returned.
Downstream, this _error code_ was interpreted as flight data, which it
should never have been. _This incorrect interpretation_ of the value
that was returned led to the chain of events that eventually broke up
the launch vehicle. Had the error code been treated as an error code,
rather than as flight data, it seems likely to me that things would
have happened quite differently.

**This type of errors cannot be solved by the programming language.**
They are remedied by ensuring that redundant systems cannot reasonably
exhibit the same problems (for example by requiring at least two
different, fully independent development efforts targetting different
architectures), by software review at various stages, by requirements
review, and by ensuring that any software that is re-used is properly
adjusted to match _all_ changed requirements. And of course, making
absolutely certain at every call site that an error return cannot
_possibly_ be confused for a valid data value, for example by
returning status and results separately. "Exceptions" versus "error
codes" is simply the mechanics of how to ensure that last; at a
technical level, "exceptions" (like those in .NET, Ada, Java, C++,
etc.) are just a special kind of glorified return values which are
handled automatically rather than manually by the programmer.

And frankly, as expensive as even blowing up a spacecraft is, if the
alternative is continuing powered flight on an uncontrollable
trajectory, controlled destruction is almost always the less bad
choice. (The initial Ariane 5 breakup was however not controlled, but
due to excessive aerodynamic forces.) Compare the space shuttle
Challenger disaster [2], where the SRBs were destroyed by the range
safety officer after the orbiter broke up, to ensure that they did not
come crashing down under power.

 [1]: http://sunnyday.mit.edu/accidents/Ariane5accidentreport.html

 [2]: https://en.wikipedia.org/wiki/Space_Shuttle_Challenger_disaster

-- 
Michael Kjörling • https://michael.kjorling.semichael at kjorling.se
                 “People who think they know everything really annoy
                 those of us who know we don’t.” (Bjarne Stroustrup)


More information about the cryptography mailing list