<div dir="ltr"><br><div class="gmail_extra"><br><br><div class="gmail_quote">On Sun, Jan 5, 2014 at 5:59 PM, Jerry Leichter <span dir="ltr"><<a href="mailto:leichter@lrw.com" target="_blank">leichter@lrw.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im"><br></div>

What I find most disturbing is that many of these bugs are trivial to avoid using techniques that have been known forever.<br>

<br>

Many, many years ago, I took over an internal hack project that someone at DEC had written.  (For those who remember, it as a LAT server that ran on VMS:  It played the role of a LAT terminal server, connected to other systems using LAT rather than CTERM.  On a LAN, LAT did much better than CTERM.)  The guy who wrote the original piece of code wrote it in a style that is all too familiar to anyone who's looked at most commercial code today:  Just assume everything you receive is valid because checking takes too long in programmer time and is "too inefficient".  The result was that the program, when it received Ethernet frames that didn't quite match what it was expecting, would crash in various horrible ways.  For reasons no one could ever really explain, such frames were surprisingly common.<br>


<br>

I set out to eliminate the crashes.  It turned out that almost all of them came down to one root cause:  The LAT protocol was defined in recursive components and subcomponents, each of them encode in TLV (Type Length Value) format.  They were parsed and handled in what you might, in a language parser, call recursive-descent style:  You used the T field of a subcomponent to select the right function; it pulled off the leading L field and dealt with the subcomponent and returned a pointer to where the next T field should start.  Unfortunately, damaged components often had garbage length fields.  (Well, the other fields might be garbage, too, but they didn't cause as much *immediate* havoc.)  Given that this was C, with no array bounds or other memory object checking, the result was that a subcomponent parser would happily walk of the end of the buffer it was handed based on the bogus length field.  The simple fix:  Walk the path of the data, from where it was pulled off the wire, through ever subcomponent parser, making sure that each function received a pointer to the end of the *containing* component, and have it check that the subcomponent didn't go beyond the containing component.  A boring couple of days of code cleanup - but, miraculously, the crashes ... stopped happening.  Who would have thought....  :-)<br>


<br>

This would have been, oh, 1986, give or take.  But somehow C programmers at larger never learned - or were even taught - the lesson.  The only thing that's gotten us away from the never-ending stream of bad C code that scribbles memory is the fading of C from most commercial products:  C++ can be somewhat resistant (if you use the built-in types, *carefully*), and the newer languages all check array and string bounds.<br>

</blockquote><div><br></div><div>You mentioned Tony Hoare earlier, he didn't use his Turing Award lecture to point out that lack of array bounds checking was going to bit on a whim. He knew that it was going to be a disaster.</div>

<div><br></div><div>The CERNLIB code for the Web was actually pretty robust as all the string handling was performed by macros with built in bounds checking.</div><div> </div></div><div><br></div>-- <br>Website: <a href="http://hallambaker.com/">http://hallambaker.com/</a><br>


</div></div>