[Cryptography] DIME // Pending Questions // Seeking Your Input
Jerry Leichter
leichter at lrw.com
Sun Mar 1 10:28:46 EST 2015
On Mar 1, 2015, at 8:47 AM, Phillip Hallam-Baker <phill at hallambaker.com> wrote:
> ...In particular, a decoder can verify the syntactic correctness of each token in the stream in a single pass using only the data previously read. Checking correctness of an ASN.1 file is a real horrorshow because an inner length encoding can be inconsistent with either an outer or an inner one.
Not to disagree that this is a good feature, but ... having written (actually, fixed) a parser for an encoding (not ASN.1, which has its own special complexities) that used a nested TLV (Type/Length/Value encoding), I'd say it's not particularly hard to get the bounds checking right. But you have to design for it from the beginning and follow the design consistently. I used something very much like recursive descent parse, but the rule was that every call took a begin pointer and end pointer (this was C). Internally, you maintain a "current position" that starts at the begin pointer and may never reach the end pointer; when you've read the length of a subelement, you compute its end pointer, which you'll pass to the subelement parser and which will become your new current pointer on return.
Note that C programmers will all too often ignore the "pass the end pointer" part (as an "optimization" since, e.g., if the field has a known length, the caller would have checked, right?). Programmers of languages with native strings will all too often just pass the starting point - which guarantees that a sub-element parser can't read past whoever constructed the top-most string, but doesn't prevent reading past the sub-element's boundaries.
Perhaps the easiest language to get this right in is Java, since the substring operation is essentially free: It points into the parent string but has a different length. Of course, you do have to get your substring arithmetic right consistently - having a helper function that pulls it out is the way to go. In C++, a substring operation will copy the data, so you generally need to compute and pass the end pointer yourself.
Still, I'll agree that people get this on-its-face trivial bit of coding wrong all the time. A parser generator is really the way to go: Get it right once and for all.
-- Jerry
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.metzdowd.com/pipermail/cryptography/attachments/20150301/a8c8aa03/attachment.html>
More information about the cryptography
mailing list