[Cryptography] Heartbleed and fundamental crypto programming practices

Sat Apr 26 11:31:34 EDT 2014

On Sat, Apr 26, 2014 at 8:10 AM, Jerry Leichter <leichter at lrw.com> wrote:
> On Apr 25, 2014, at 8:34 AM, Phillip Hallam-Baker wrote:
>> 1) ASN.1 BER has an inherently unsafe, difficult to implement encoding option
> ASN.1 was designed at a time when networks were slow and every bit counted.  It's an *extremely* tight encoding.

That is what I thought was the reason, but its actually subtly
different. JSON-B is almost as dense as ASN.1 BER. JSON-C is actually
better and unlike ASN.1 it is a regular encoding that does not depend
on the schema.

The problem with ASN.1 is that the schema is designed to describe the
encoding of the bits on the wire rather than the bits on the wire
being a representation of the schema.

That is why they have the weird stuff like implicit and explicit
tagging, they make sense only if you are thinking about how to lay out
bits on the wire.

>> What I mean by unsafe is the following, X.509 DER requires the use of
>> definite length encodings so that if I have a sequence nested inside a
>> sequence the bytes on the wire will be something like:
>>
>> <tag1> <Length1> <value1>
>>
>> where <value1> = <tag2> <Length2> <value2>
>>
>> result: <tag1> <Length1> <tag2> <Length2> <value2>
>>
>>
>> Which all looks very sensible
> It's an absolutely standard TLV (Type or Tag/Length/Value) encoding.  The advantage of a format like this is that a reader can easily skip over fields it doesn't understand - or even forward them on to another reader who might be able to understand them.  Google protobuf's use something similar, for exactly this reason:  You can add new fields to an existing protobuf, and old code will have no trouble with it - and can even modify the fields it does understand while passing the new ones off to someone else unchanged.

So does IP and that is where we had a whole slew of issues for IPv4
and will no doubt have a whole series of issues for IPv6 which has a
huge number of options

ASN.1 does have a safe way to represent structures, the indefinite
length encoding. Instead of giving the length of a nested structure in
bytes, the end of the structure is specified as a terminal marker.

As for skipping ahead in a structure using those length markers, I
have seen lots of code like that. I generally scrub it as soon as
possible because I know that it is never going to be correct and even
if it is correct it is going to be vulnerable to illegal inputs
because the act of skipping means that the data isn't being checked.

After a lot of bad experiences, the only approach to parsing data I
accept is to take the input data, perform authentication checks,
reduce the data set to exclude all the data that has not been
authenticated, parse the input stream in its entirety and return the
parsed structure to the dispatcher. As you point out, CPU is cheap.

>> until we start on the fact that the
>> length encodings in Assanine One are themselves variable length, so
>> you can't calculate the <length1> without having first finalized
>> <value2>.
> You couldn't do that *anyway*!  <Length1> covers all of <value2>.  Once you know it, you can encode it - variable length encoding or not.  Until then, you can do nothing.

It means I have to pretty much encode the data just to find out the
length. I can't take shortcuts.

>> It is not possible to emit ASN.1 using a simple recursive
>> descent scheme unless you construct the output values in reverse, from
>> the last item to the first.
> I don't understand this comment.  Any TLV coding has to use the style:
>
> 1.  Decide on T
> 2.  Compute (the encoding of) a T instance as V
> 3.  Return encoding as T Encode(Length(V)) V
>
> This is the same no matter what Encode() is.

Looks like you are going to spend all your time copying data from one
buffer to another.

The simple way to encode ASN.1 is to perform a simple recursive
descent but emit the data values on the way out rather than the way
in.

So if you have A ( B (C, D))), the first data that you emit is D, then
C, then B, then A. The advantage of this is that you can traverse the
data in one pass with a simple stack and the data length you need is
always on the top of the stack.

[This actually reminds me of another good practice for coding
encoders: don't use recursion. Or rather code the recursion explicitly
rather than using subroutine calls. The reason for this being that use
of subroutine calls lays you open to attacks. Though this is something
I haven't applied in my own code yet.]

> Yes, writing that out by hand every time is boring and leads to shortcuts and errors.  You want a generator to do this for you or you, or some successor, will eventually get it wrong.

The number of programmers who understand the value of generators is
very small. Maybe 5% and the number who would think to write one, very
small.

One of the things a lot of people don't get about UNIX is that the
fundamental approach was toolbuilding, see Software Tools by Kernighan
and Plaugher to see the real heart of the UNIX architecture.

>> Unless the decoding logic is just right, the decoder can end up with a
>> buffer overrun. And getting the logic right requires a lot of
>> discipline. A lot of decoders will blindly accept the input as valid
>> and the world is lost...
> The problem isn't with TLV encodings - ASN.1 or otherwise.  The problem is that after all these years, we're still writing this stuff by hand - and on top of it, when it's written in C, it's pretty much guaranteed that things what's being passed around is a raw buffer pointer - and often no one bothers to pass the length along, at least when the "know" it doesn't matter (e.g., the decoder for the L field will probably get just a raw pointer, because how long can a length field be?)

Funny you should mention that...

http://sourceforge.net/projects/jsonschema/

-- 
Website: http://hallambaker.com/