Why self describing data formats:

Nicolas Williams Nicolas.Williams at sun.com
Thu Jun 21 13:03:15 EDT 2007

On Fri, Jun 01, 2007 at 08:59:55PM +1000, James A. Donald wrote:
> Many protocols use some form of self describing data format, for example 
> ASN.1, XML, S expressions, and bencoding.

ASN.1 is not an encoding, and not all its encodings are self-describing.

Specifically, PER is a compact encoding such that a PER encoding of some
data cannot be decoded without access to the ASN.1 module(s) that
describes the data types in question.

Yes, it's a nit.

Then there's XDR -- which can be thought of as a subset of ASN.1 and a
four-octet aligned version of PER (XDR being both, a syntax and an

> Why?

Supposedly it is (or was thought to be) easier to write encoders/
decoders for TLV encodings (BER, DER, CER) and S-expressions, but I
don't believe it (though I certainly believe that it was thought to be
easier): rpcgen is a simple enough program, for example.

TLV encodings tend to quite redundant, in a way that seems dangerous: a
lazy programmer can (and many have) write code that fails to validate
parts of an encoding and mostly get away with it (until the then
inevitable subsequent buffer overflow, of course).

Of course, code generators and libraries for self-describing and non-
self-describing encodings alike are not necessarily bug free (have any
been?) but at least they have the virtue that they are automatic tools
that consume a formal language, thus limiting the number of lazy
programmers involved and the number of different ways in which they can
screw up (and they leave their consumers off the hook, to a point).

> Presumably both ends of the conversation have negotiated what protocol 
> version they are using (and if they have not, you have big problems) and 
> when they receive data, they need to get the data they expect.  If they 
> are looking for list of integer pairs, and they get a integer string 
> pairs, then having them correctly identified as strings is not going to 
> help much.

I agree.  The redundancy of TLV encodings, XML, etcetera, is
unnecessary.  Note though that I'm only talking about serialization
formats for data in protocols; XML, I understand, was intended for
_documents_, and it does seem quite appropriate for that, and so it can
be expected that there should be a place for it in Internet protocols in
transferring pieces of documents.


The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to majordomo at metzdowd.com

More information about the cryptography mailing list