[Cryptography] Heartbleed and fundamental crypto programming practices

Sat Apr 26 08:10:34 EDT 2014

On Apr 25, 2014, at 8:34 AM, Phillip Hallam-Baker wrote:
> 1) ASN.1 BER has an inherently unsafe, difficult to implement encoding option
ASN.1 was designed at a time when networks were slow and every bit counted.  It's an *extremely* tight encoding.

Ironically, today networks are immensely faster - but CPU's have gotten faster "even faster".  Because CPU is so cheap relative to networks (not to mention disks and even SSD's), it's worth it to throw CPU at data to render it into a highly compressed form.  Take a look at Google's protobuf format for an example.

> What I mean by unsafe is the following, X.509 DER requires the use of
> definite length encodings so that if I have a sequence nested inside a
> sequence the bytes on the wire will be something like:
> 
> <tag1> <Length1> <value1>
> 
> where <value1> = <tag2> <Length2> <value2>
> 
> result: <tag1> <Length1> <tag2> <Length2> <value2>
> 
> 
> Which all looks very sensible
It's an absolutely standard TLV (Type or Tag/Length/Value) encoding.  The advantage of a format like this is that a reader can easily skip over fields it doesn't understand - or even forward them on to another reader who might be able to understand them.  Google protobuf's use something similar, for exactly this reason:  You can add new fields to an existing protobuf, and old code will have no trouble with it - and can even modify the fields it does understand while passing the new ones off to someone else unchanged.

> until we start on the fact that the
> length encodings in Assanine One are themselves variable length, so
> you can't calculate the <length1> without having first finalized
> <value2>.
You couldn't do that *anyway*!  <Length1> covers all of <value2>.  Once you know it, you can encode it - variable length encoding or not.  Until then, you can do nothing.

> It is not possible to emit ASN.1 using a simple recursive
> descent scheme unless you construct the output values in reverse, from
> the last item to the first.
I don't understand this comment.  Any TLV coding has to use the style:

1.  Decide on T
2.  Compute (the encoding of) a T instance as V
3.  Return encoding as T Encode(Length(V)) V

This is the same no matter what Encode() is.

> There is also a subtle opportunity here for an interesting type of
> bug, just like the heartbleed bug:
> 
> What if Length2 is given as greater than the length of value2?
Again, this is true of any TLV-like encoding.  I've implemented many such things, and very long ago realized that the parser can be recursive descent, but it must have the following form:

	v = Decode_T(TLV-encoded-field, Length(Container));

Where Entity is the Container is the V field of the surrounding entity.  Then every Decode_T() starts off as:

	Decode_T(TLV tlv, int len) {
		T t = Tag(tlv);
		if (t != T) abort(WrongType);
		l = Decode(Length(tlv), len);  // Length field must fit in len!
		if (l > len) abort(FieldOverrunsContainer);
		...extract V and process...
	}

Yes, writing that out by hand every time is boring and leads to shortcuts and errors.  You want a generator to do this for you or you, or some successor, will eventually get it wrong.

> Unless the decoding logic is just right, the decoder can end up with a
> buffer overrun. And getting the logic right requires a lot of
> discipline. A lot of decoders will blindly accept the input as valid
> and the world is lost...
The problem isn't with TLV encodings - ASN.1 or otherwise.  The problem is that after all these years, we're still writing this stuff by hand - and on top of it, when it's written in C, it's pretty much guaranteed that things what's being passed around is a raw buffer pointer - and often no one bothers to pass the length along, at least when the "know" it doesn't matter (e.g., the decoder for the L field will probably get just a raw pointer, because how long can a length field be?)
							-- Jerry