[Cryptography] Encodings for crypto

Mon Feb 17 13:42:30 EST 2014

Responding to Ted on ASN.1 PER...

Yes, ASN.1 is horribly complex and I say this as someone who just
implemented a JSON encoder and an ASN.1 encoder in the same month. The JSON
encoder and decoder took me only a few hours in C#, less than the time it
would take me to learn someone's API. The ASN.1 encoder took three weeks.
And that was just for DER.

My view is that we should try to converge on one data model for encoding
data and the model that the industry has chosen is JSON. We can use JSON as
an interchange format for any data that we can encode in ASN.1 or XML but
it is a lot simpler.

There are however some real problems with using a text based format for
crypto. In particular:

* BASE64 encoding of binary data increases the size by 33% per pass. Which
is a pain.

* Escaped encoding of text strings is inefficient.

* Decimal encoding of binary floating point values introduces conversion
errors.

* Parsing of text based tags is inefficient.

There was some effort to rectify this but it was a private effort, not an
IETF effort. So the principals felt free to only consider their
requirements telling other people to shove off. So an opportunity was
missed.

I think we can do better:

https://tools.ietf.org/html/draft-hallambaker-jsonbcd-01

There are three encodings:

JSON-B 'Binary' Adds binary encodings for numbers and length delineated
encodings for strings and binary data

JSON-C 'Compression' Adds tag compression to JSON-B

JSON-D 'Data' Adds in additional data formats (e.g. 128 bit and 80 bit
floats) that are used in scientific calculations.

The basic idea here is that instead of an alternative encoding to JSON, we
make use of code points that are unused in JSON (i.e. all the code points
above 127) to introduce binary sequences. This has a number of important
consequences:

1) It is only necessary to implement one decoder. JSON is a strict subset
of JSON-B which is a strict subset of JSON-C which is a strict subset of
JSON-D

2) It is possible to insert JSON encoded data into a JSON-B sequence
without needing to re-encode.

Encoding formats can be a security nightmare. Especially when there are
nested structures specified by length. I don't consider ASN.1 BER to be
safe. Poor implementation leads to smashing the stack.

Unfortunately JSON supporters tend to be folk with a scripting mentality
rather than a structured type checking mentality so at this point we don't
even have a schema language for JSON (although to be fair this is in part
because ASN.1 schema and XML Schema are so very awful).

As I see it, there is only one way to parse a security sensitive data
structure:

1) Read in the bytes, abort if there are too many

2) Authenticate the bytes (Nope, I don't believe in canonicalization.)

3) Parse the data structure to internal representation, discarding all data
that is not understood.

4) Validate the data for consistency (if possible)

5) Pass to the security sensitive module to process.

The BitCoin bug looks to me to be a bug caused by misplaced faith in the
power of canonicalization and assuming a structure to be in canonical form
when it isn't rather than a lack of c18n.

Doing authentication at the byte level before parsing is robust and
insulates the code from parser errors. It is also a good defense against
injection errors if we enforce a rule that every command is independently
authenticated and validated.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.metzdowd.com/pipermail/cryptography/attachments/20140217/6a9b2417/attachment.html>