<html><head><meta http-equiv="Content-Type" content="text/html charset=us-ascii"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">On Mar 1, 2015, at 8:47 AM, Phillip Hallam-Baker <<a href="mailto:phill@hallambaker.com">phill@hallambaker.com</a>> wrote:<br><div><blockquote type="cite"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div><font color="#000000">...</font>In particular, a decoder can verify the syntactic correctness of each token in the stream in a single pass using only the data previously read. Checking correctness of an ASN.1 file is a real horrorshow because an inner length encoding can be inconsistent with either an outer or an inner one.</div></div></div></div></blockquote>Not to disagree that this is a good feature, but ... having written (actually, fixed) a parser for an encoding (not ASN.1, which has its own special complexities) that used a nested TLV (Type/Length/Value encoding), I'd say it's not particularly hard to get the bounds checking right.  But you have to design for it from the beginning and follow the design consistently.  I used something very much like recursive descent parse, but the rule was that every call took a begin pointer and end pointer (this was C).  Internally, you maintain a "current position" that starts at the begin pointer and may never reach the end pointer; when you've read the length of a subelement, you compute its end pointer, which you'll pass to the subelement parser and which will become your new current pointer on return.</div><div><br></div><div>Note that C programmers will all too often ignore the "pass the end pointer" part (as an "optimization" since, e.g., if the field has a known length, the caller would have checked, right?).  Programmers of languages with native strings will all too often just pass the starting point - which guarantees that a sub-element parser can't read past whoever constructed the top-most string, but doesn't prevent reading past the sub-element's boundaries.</div><div><br></div><div>Perhaps the easiest language to get this right in is Java, since the substring operation is essentially free:  It points into the parent string but has a different length.  Of course, you do have to get your substring arithmetic right consistently - having a helper function that pulls it out is the way to go.  In C++, a substring operation will copy the data, so you generally need to compute and pass the end pointer yourself.</div><div><br></div><div>Still, I'll agree that people get this on-its-face trivial bit of coding wrong all the time.  A parser generator is really the way to go:  Get it right once and for all.</div><div><div>                                                        -- Jerry</div><div><br></div></div></body></html>