[Cryptography] letter versus spirit of the law ... Eventus incertus delendus est

Jerry Leichter leichter at lrw.com
Fri Oct 30 06:51:13 EDT 2015


> So: Deliberately crashing in release code is pretty much always wrong.  OTOH
> continuing anyway, even with slightly incorrect values, is often right, but in
> any case still better than crashing.
It depends.  It's possible to structure you entire architecture around failure and recovery by restart.

In fact, Google does this.  At Google scales, the assumption that your hardware won't just randomly flake out as you're running is no longer tenable.  Given that, Google in turn saves money by deliberately trading number and performance of processors and other elements against lower reliability.  So programmers have to develop code on the assumption that the hardware could die between any two instructions.  Having decided that ... you might as well apply the same approach to software faults.  It's not that you ignore them; in fact, you try to detect them - e.g., check that parameters that are not supposed to be null are, in fact, not null.  But the response - provided directly by the checking primitives in the standard libraries - is to simply crash the program when the check fails.  The libraries are full of primitives with names like "openFileForReadOrFail()" - the "OrFail" means "crash the program if you can't do it".

Writing repair/recovery code within functional elements is discouraged.  Recovery is left to higher-level mechanisms.  As a simple example, if one of the mappers in a map/reduce dies, the map/reduce framework will eventually notice it hasn't delivered its results and pass its work off to someone else.

In some ways, you can compare this to the difference between the early designs of database systems and file systems - which put enormous effort into pre-validating requests, planning locking strategies to prevent deadlocks, running elaborate repair and recovery strategies (fsck, in the file system case) - and the strategy that most such systems use today:  Do work in transactions that, when things go wrong, just bail and try again from the top.

I'm not recommending this approach as the right one *in general*; it has its own set of advantages and disadvantages, appropriate and inappropriate areas of application.  If you come to it from a more traditional programming environment, it takes time to internalize the design approach necessary to make it work.  But you should be aware that the alternative is viable in many cases.

                                                        -- Jerry



More information about the cryptography mailing list