[Cryptography] defaults, black boxes, APIs, and other engineering thoughts

Sun Jan 5 17:59:54 EST 2014

On Jan 5, 2014, at 3:25 PM, Jonathan Thornburg wrote:
> But this raises some genuine questions:
> * Is there a secure web browser?  My trust level in any of the biggies
>  (Microsoft, Apple, Google, Mozilla) is low...
I'm with you.  For what it's worth, I think Chrome is probably, across time, the most secure, because Google puts a huge amount of effort involving a really experienced team into making it so.  I place some amount of trust in Safari, but that's a matter of statistics, not anything special about the code:  People aren't attacking it as much.  (Apple seems to have been getting ever more serious, but how far they've come is hard to judge.)

But a modern browser is the ultimate example of immense complexity in commonly used software today.  What people like in browsers is their universality:  They get judged on their ability to accept any trash that comes from a website, in any bizarre file format anyone's invented over they years, and "work".  I just don't see how they can possibly be made secure.

I do find fascinating the reaction to the never-ending series of security issues in Flash and Java.  What people have learned from this is:  Plugins are bad; Flash itself is bad.  They looked to the promised land of HTML5 as a way to do Flash-like stuff without a plug-in and without Flash.  But just what reason does anyone have to think that HTML5 implementations will be any better than Flash?  If anything, HTML5 is even more complex than Flash ever was.  And the demands for high performance, even while dealing with an area that's immensely complicated from top to bottom - from the inputs to the graphics devices at the other end, plus the complex input models, are there for HTML5, just as they were for Flash, and will tend to make secure coding an immense challenge.

> * I've just booked a hotel room in <distant city>; the hotel sent me a
>  .docx file which claims to be a confirmation.  Is there an "office suite"
>  in which it's safe for me to look at that .docx file?
docx - maybe.  There's no hope for the old .doc files.

As far as I know, no one other than Microsoft has ever managed to write a Word document processor that works 100% of the time.  There are some that work most of the time, but Word has so many odd corners - none of them documented in any reasonable way; as I recall, the European courts made Microsoft publish documentation, but Microsoft managed to make them so obfuscated that they didn't help - that it's proven impossible.  If you can't even get the *intended* functionality right, getting the security right is rather challenging.

> * Same question, but for pdf files?
I think we have the makings of an excellent context here:  Pick one of these - PDF is probably the best choice - and ask for a secure implementation.  The implementor may omit parts of the spec if he doesn't believe they can be implemented securely.  Points are charged for such an omission only if someone else in the contest manages to produce a secure implementation of the same features.

Code to be judged in a way that's similar to the contest that chose AES:  Public review by experts in the field, by other contestants, and by anyone else who chooses to comment.

The AES contest drew people in because there was a chance for an academic or other expert to have his work become a national standard.  I don't think that would work here - you'd need someone to kick in serious money to offer, both to the chosen developer, and to anyone who came up with worthwhile attacks.

The real purpose of the contest would be to examine, and push, the state of the art in secure code development.  If done correctly, this could teach us a great deal about how to go about this.

What I find most disturbing is that many of these bugs are trivial to avoid using techniques that have been known forever.

Many, many years ago, I took over an internal hack project that someone at DEC had written.  (For those who remember, it as a LAT server that ran on VMS:  It played the role of a LAT terminal server, connected to other systems using LAT rather than CTERM.  On a LAN, LAT did much better than CTERM.)  The guy who wrote the original piece of code wrote it in a style that is all too familiar to anyone who's looked at most commercial code today:  Just assume everything you receive is valid because checking takes too long in programmer time and is "too inefficient".  The result was that the program, when it received Ethernet frames that didn't quite match what it was expecting, would crash in various horrible ways.  For reasons no one could ever really explain, such frames were surprisingly common.

I set out to eliminate the crashes.  It turned out that almost all of them came down to one root cause:  The LAT protocol was defined in recursive components and subcomponents, each of them encode in TLV (Type Length Value) format.  They were parsed and handled in what you might, in a language parser, call recursive-descent style:  You used the T field of a subcomponent to select the right function; it pulled off the leading L field and dealt with the subcomponent and returned a pointer to where the next T field should start.  Unfortunately, damaged components often had garbage length fields.  (Well, the other fields might be garbage, too, but they didn't cause as much *immediate* havoc.)  Given that this was C, with no array bounds or other memory object checking, the result was that a subcomponent parser would happily walk of the end of the buffer it was handed based on the bogus length field.  The simple fix:  Walk the path of the data, from where it was pulled off the wire, through ever subcomponent parser, making sure that each function received a pointer to the end of the *containing* component, and have it check that the subcomponent didn't go beyond the containing component.  A boring couple of days of code cleanup - but, miraculously, the crashes ... stopped happening.  Who would have thought....  :-)

This would have been, oh, 1986, give or take.  But somehow C programmers at larger never learned - or were even taught - the lesson.  The only thing that's gotten us away from the never-ending stream of bad C code that scribbles memory is the fading of C from most commercial products:  C++ can be somewhat resistant (if you use the built-in types, *carefully*), and the newer languages all check array and string bounds.
                                                        -- Jerry