[Cryptography] Is ASN.1 still the thing?

Sat Nov 25 22:57:54 EST 2017

On Sat, Nov 25, 2017 at 12:32:59PM +0100, Florian Weimer wrote:
> * Peter Gutmann:
> > ASN.1 has a lot of design-by-committee junk in it (the date format, for
> > example), but BER and DER are pretty clean.

ASN.1 is pretty clean, but BER/DER/CER are all crap (mainly the TLV
thing, and the poor choices made for canonicalization in DER and CER).

> What I find very hard, as someone who has never been formally trained
> in the ASN.1 arts, is going from a specification like this:

I've never been formally trained in ASN.1 either.

>    Certificate  ::=  SEQUENCE  {
>         tbsCertificate       TBSCertificate,
>         signatureAlgorithm   AlgorithmIdentifier,
>         signatureValue       BIT STRING  }
> 
>    TBSCertificate  ::=  SEQUENCE  {
>         version         [0]  EXPLICIT Version DEFAULT v1,
>         serialNumber         CertificateSerialNumber,
>         signature            AlgorithmIdentifier,
>         issuer               Name,
>         validity             Validity,
>         subject              Name,
>         subjectPublicKeyInfo SubjectPublicKeyInfo,
>         issuerUniqueID  [1]  IMPLICIT UniqueIdentifier OPTIONAL,
>                              -- If present, version MUST be v2 or v3
>         subjectUniqueID [2]  IMPLICIT UniqueIdentifier OPTIONAL,
>                              -- If present, version MUST be v2 or v3
>         extensions      [3]  EXPLICIT Extensions OPTIONAL
>                              -- If present, version MUST be v3
>         }
> 
> to the BER/DER encoding.  The problem is this:
> 
>         version         [0]  EXPLICIT Version DEFAULT v1,
> 
> which has a funny impact on the encoding, which turns out rather
> irregular at this point.  The reset is pretty boring TLV stuff and
> easy to implement, but I never found a specification of what
> *actually* happens here.  If it's described in X.690 (07/2002), I
> really don't see it.

Are you referring to the EXPLICIT keyword?

Explicit tagging -> TLV nesting.  I.e., TLV' where V' is the underlying
TLV, so: TLTLV.

That's right: extra redundantly and ridiculously wasteful.

Here you get:

  Tag(CONTEXT, 0) || Length(Tag(UNIVERSAL, INTEGER) ||
                        Length(<encoding of INTEGER 0, 1, or 2>) ||
                        <encoding of INTEGER 0, 1, or 2)

What was the point of using EXPLICIT tagging for that field and IMPLICIT
for the rest?  I don't know, and so far I can't think of an obvious
reason.

(While we're on the subject of mistakes in x.509, it's very unnatural to
 apply a signature to a subset of a PDU (in this case, tbsCertificate).
 tbsCertificate should have been an OCTET STRING containing the
 TBSCertificate.  But whatever.)

As for where this is described...  It's described in x.690, section
8.14.  It's also referred to in a few places in x.680.

 - x.690, 8.14:

   8.14 Encoding of a tagged value
   8.14.1 The encoding of a tagged value shall be derived from the complete
          encoding of the corresponding data value of the type appearing
          in the "TaggedType" notation (called the base encoding) as
          specified in 8.14.2 and 8.14.3.
   8.14.2 If implicit tagging (see ITU-T Rec. X.680 | ISO/IEC 8824-1,
          30.6) was not used in the definition of the type, the encoding
          shall be constructed and the contents octets shall be the
          complete base encoding.
   8.14.3 If implicit tagging was used in the definition of the type, then:
          a)     the encoding shall be constructed if the base encoding
                 is constructed, and shall be primitive otherwise; and
          b)     the contents octets shall be the same as the contents
                 octets of the base encoding.

This actually makes sense if you read the rest of the spec, but it's
also made more obvious by already knowing that EXPLICIT tagging means
one more layer of TLV while IMPLICIT tagging means replacing the tag
(but not the constructed/primitive bit) in the TLV encoding :(

(I'm probably coming across as defending ASN.1, but I want to make
 absolutely clear that BER/DER/CER are horrible.  My defense of ASN.1 is
 mostly about dissuading people from reinventing that wheel _badly_.  If
 you want to reinvent it, please don't make the TLV mistake again.)

And here's a couple of places where x.680 talks about tagging:

 - x.680, 30.5:

   All application of tags is either implicit tagging or explicit
   tagging.  Implicit tagging indicates, for those encoding rules which
   provide the option, that explicit identification of the original tag
   of the "Type" in the "TaggedType" is not needed during transfer.

     NOTE -- It can be useful to retain the old tag where this was
     universal class, and hence unambiguously identifies the old type
     without knowledge of the ASN.1 definition of the new type.  Minimum
     transfer octets is, however, normally achieved by the use of
     IMPLICIT.  An example of an encoding using IMPLICIT is given in
     ITU-T Rec. X.690 | ISO/IEC 8825-1.

 - x.680, E.2.12.5

   Textual use of IMPLICIT with every tag is generally found only in
   older specifications.  BER produces a less compact representation
   when explicit tagging is used than when implicit tagging is used.
   PER produces the same compact encoding in both cases.  With BER and
   explicit tagging, there is more visibility of the underlying type
   (INTEGER, REAL, BOOLEAN, etc.) in the encoded data.  These guidelines
   use implicit tagging in the examples whenever it is legal to do so.
   This may, depending on the encoding rules, result in a compact
   representation, which is highly desirable in some applications.  In
   other applications, compactness may be less important than, for
   example, the ability to carry out strong type-checking.  In the
   latter case, explicit tagging can be used.

Nico
--