<div dir="ltr"><div class="gmail_extra">Responding to Ted on ASN.1 PER...

</div><div class="gmail_extra"><br></div><div class="gmail_extra">Yes, ASN.1 is horribly complex and I say this as someone who just implemented a JSON encoder and an ASN.1 encoder in the same month. The JSON encoder and decoder took me only a few hours in C#, less than the time it would take me to learn someone's API. The ASN.1 encoder took three weeks. And that was just for DER.</div>

<div class="gmail_extra"><br></div><div class="gmail_extra">My view is that we should try to converge on one data model for encoding data and the model that the industry has chosen is JSON. We can use JSON as an interchange format for any data that we can encode in ASN.1 or XML but it is a lot simpler.</div>

<div class="gmail_extra"><br></div><div class="gmail_extra">There are however some real problems with using a text based format for crypto. In particular:</div><div class="gmail_extra"><br></div><div class="gmail_extra">* BASE64 encoding of binary data increases the size by 33% per pass. Which is a pain.</div>

<div class="gmail_extra"><br></div><div class="gmail_extra">* Escaped encoding of text strings is inefficient.</div><div class="gmail_extra"><br></div><div class="gmail_extra">* Decimal encoding of binary floating point values introduces conversion errors.</div>

<div class="gmail_extra"><br></div><div class="gmail_extra">* Parsing of text based tags is inefficient.</div><div class="gmail_extra"><br></div><div class="gmail_extra">There was some effort to rectify this but it was a private effort, not an IETF effort. So the principals felt free to only consider their requirements telling other people to shove off. So an opportunity was missed.</div>

<div class="gmail_extra"><br></div><div class="gmail_extra">I think we can do better:</div><div class="gmail_extra"><br></div><div class="gmail_extra"><a href="https://tools.ietf.org/html/draft-hallambaker-jsonbcd-01">https://tools.ietf.org/html/draft-hallambaker-jsonbcd-01</a><br>

</div><div class="gmail_extra"><br></div><div class="gmail_extra">There are three encodings:</div><div class="gmail_extra"><br></div><div class="gmail_extra">JSON-B 'Binary' Adds binary encodings for numbers and length delineated encodings for strings and binary data</div>

<div class="gmail_extra"><br></div><div class="gmail_extra">JSON-C 'Compression' Adds tag compression to JSON-B</div><div class="gmail_extra"><br></div><div class="gmail_extra">JSON-D 'Data' Adds in additional data formats (e.g. 128 bit and 80 bit floats) that are used in scientific calculations.</div>

<div class="gmail_extra"><br></div><div class="gmail_extra">The basic idea here is that instead of an alternative encoding to JSON, we make use of code points that are unused in JSON (i.e. all the code points above 127) to introduce binary sequences. This has a number of important consequences:</div>

<div class="gmail_extra"><br></div><div class="gmail_extra">1) It is only necessary to implement one decoder. JSON is a strict subset of JSON-B which is a strict subset of JSON-C which is a strict subset of JSON-D</div><div class="gmail_extra">

<br></div><div class="gmail_extra">2) It is possible to insert JSON encoded data into a JSON-B sequence without needing to re-encode.</div><div class="gmail_extra"><br></div><div class="gmail_extra"><br></div><div class="gmail_extra">

Encoding formats can be a security nightmare. Especially when there are nested structures specified by length. I don't consider ASN.1 BER to be safe. Poor implementation leads to smashing the stack.</div><div class="gmail_extra">

<br></div><div class="gmail_extra">Unfortunately JSON supporters tend to be folk with a scripting mentality rather than a structured type checking mentality so at this point we don't even have a schema language for JSON (although to be fair this is in part because ASN.1 schema and XML Schema are so very awful).</div>

<div class="gmail_extra"><br></div><div class="gmail_extra">As I see it, there is only one way to parse a security sensitive data structure:</div><div class="gmail_extra"><br></div><div class="gmail_extra">1) Read in the bytes, abort if there are too many</div>

<div class="gmail_extra"><br></div><div class="gmail_extra">2) Authenticate the bytes (Nope, I don't believe in canonicalization.)</div><div class="gmail_extra"><br></div><div class="gmail_extra">3) Parse the data structure to internal representation, discarding all data that is not understood.</div>

<div class="gmail_extra"><br></div><div class="gmail_extra">4) Validate the data for consistency (if possible)</div><div class="gmail_extra"><br></div><div class="gmail_extra">5) Pass to the security sensitive module to process.</div>

<div class="gmail_extra"><br></div><div class="gmail_extra">The BitCoin bug looks to me to be a bug caused by misplaced faith in the power of canonicalization and assuming a structure to be in canonical form when it isn't rather than a lack of c18n.</div>

<div class="gmail_extra"><br></div><div class="gmail_extra">Doing authentication at the byte level before parsing is robust and insulates the code from parser errors. It is also a good defense against injection errors if we enforce a rule that every command is independently authenticated and validated.</div>

</div>