[Cryptography] Encodings for crypto

Tue Feb 18 07:54:01 EST 2014

I think we can do a lot lot better.  I have a document somewhere on
this, but in brief:

There are too many primitives.  I see 11 doing numbers alone!  In
practice, in network protocols, we do not need bignums, we do not need
floats and we do not need negatives.  Then, for different sized numbers,
we should remember that we are about simplification and higher level
concepts.  Which should tell us we need a number.  Not the four
remaining of 8, 16, 32, 64 bits which are hangovers from the hardware
days.  We need one number that simply expands to fill the needs.  This
is done with the 7 bit trick, where the high bit being set says there is
a following extra byte.

Next, we should really be thinking in OO terms.  When we are dealing in
OO, we have a single object that 'knows' its output, and its input
intimately.  Which is to say, it knows whereas your spec does not.  The
object can do semantics such as range checking and small composition
such as conversion of byte arrays to strings.

We do however need a sequence of bytes, and a byte array is constructed
simply with a length (number as above) and a sequence of bytes.  That's
2 primitives so far.

Once we get thinking in OO terms, we can then create anything that is
desired within the context of the class.  Need a version?  Add a number.
 Need a boolean?  Add a flags number.  Need a negative?  Add another flag.

Next.  When we get more complicated, we can simply encapsulate the
complications into a new class/object.  This would be idea for floats
for example.  Which leads us to the second observation:  composition of
objects is a far more natural way to build up protocol elements.

One more thing.  Without a seriously good testing and debugging system,
this whole area breaks down with complexity, which is why people eschew
binary and go for text.  There is a loopback technique I use to solve
this issue, which I call the Ouroboros pattern.  In short it is this:

1.  each object has an example() method which produces a correct object
with with each field constructed randomly.
2.  write that out through the stream process.
3.  read it back in to a new object, using the converse stream process.
4.  compare the two objects with standard equals() method.

Run 2^5 times for each class.  Due to composition and repeated tests
this solves the complexity issue.

It should be noted that another way to do this is to use a high level
meta language like XML or ProtoBufs.  But that drags in all the
complexity of another language / system / library / parser madness.
This technique does not, it's just simple OO programming.

Richard Farrell said:
> Bottom line: I fear that *only* lampooning 1980's data structure
> design risks repeating rather than learning from the errors made.
>
> And it seems a bit odd to be so fixated on essentially the 1988
> blue-book, from which most of the rest of the design followed.

Probably the key point here is that if you are sill thinking about
protocols along the old bits & Bytes way that Richard highlighted,
you're missing out.  Doing protocols with OO thinking is so much easier
you never ever go back.  Once you do that, all of the formats, ideas,
layouts, MLs start to look a little .. 20th century, steampunk, historical.

iang

ps; I have a paper that describes this in more depth, bug me for a
review preview (yes, feedback :) ) or bug me to release it.

On 17/02/2014 21:42 pm, Phillip Hallam-Baker wrote:
> Responding to Ted on ASN.1 PER...
> 
> Yes, ASN.1 is horribly complex and I say this as someone who just
> implemented a JSON encoder and an ASN.1 encoder in the same month. The JSON
> encoder and decoder took me only a few hours in C#, less than the time it
> would take me to learn someone's API. The ASN.1 encoder took three weeks.
> And that was just for DER.
> 
> My view is that we should try to converge on one data model for encoding
> data and the model that the industry has chosen is JSON. We can use JSON as
> an interchange format for any data that we can encode in ASN.1 or XML but
> it is a lot simpler.
> 
> There are however some real problems with using a text based format for
> crypto. In particular:
> 
> * BASE64 encoding of binary data increases the size by 33% per pass. Which
> is a pain.
> 
> * Escaped encoding of text strings is inefficient.
> 
> * Decimal encoding of binary floating point values introduces conversion
> errors.
> 
> * Parsing of text based tags is inefficient.
> 
> There was some effort to rectify this but it was a private effort, not an
> IETF effort. So the principals felt free to only consider their
> requirements telling other people to shove off. So an opportunity was
> missed.
> 
> I think we can do better:
> 
> https://tools.ietf.org/html/draft-hallambaker-jsonbcd-01
> 
> There are three encodings:
> 
> JSON-B 'Binary' Adds binary encodings for numbers and length delineated
> encodings for strings and binary data
> 
> JSON-C 'Compression' Adds tag compression to JSON-B
> 
> JSON-D 'Data' Adds in additional data formats (e.g. 128 bit and 80 bit
> floats) that are used in scientific calculations.
> 
> The basic idea here is that instead of an alternative encoding to JSON, we
> make use of code points that are unused in JSON (i.e. all the code points
> above 127) to introduce binary sequences. This has a number of important
> consequences:
> 
> 1) It is only necessary to implement one decoder. JSON is a strict subset
> of JSON-B which is a strict subset of JSON-C which is a strict subset of
> JSON-D
> 
> 2) It is possible to insert JSON encoded data into a JSON-B sequence
> without needing to re-encode.
> 
> 
> Encoding formats can be a security nightmare. Especially when there are
> nested structures specified by length. I don't consider ASN.1 BER to be
> safe. Poor implementation leads to smashing the stack.
> 
> Unfortunately JSON supporters tend to be folk with a scripting mentality
> rather than a structured type checking mentality so at this point we don't
> even have a schema language for JSON (although to be fair this is in part
> because ASN.1 schema and XML Schema are so very awful).
> 
> As I see it, there is only one way to parse a security sensitive data
> structure:
> 
> 1) Read in the bytes, abort if there are too many
> 
> 2) Authenticate the bytes (Nope, I don't believe in canonicalization.)
> 
> 3) Parse the data structure to internal representation, discarding all data
> that is not understood.
> 
> 4) Validate the data for consistency (if possible)
> 
> 5) Pass to the security sensitive module to process.
> 
> The BitCoin bug looks to me to be a bug caused by misplaced faith in the
> power of canonicalization and assuming a structure to be in canonical form
> when it isn't rather than a lack of c18n.
> 
> Doing authentication at the byte level before parsing is robust and
> insulates the code from parser errors. It is also a good defense against
> injection errors if we enforce a rule that every command is independently
> authenticated and validated.
> 
> 
> 
> _______________________________________________
> The cryptography mailing list
> cryptography at metzdowd.com
> http://www.metzdowd.com/mailman/listinfo/cryptography
>