[Cryptography] Uniform Data Fingerprint

Phillip Hallam-Baker phill at hallambaker.com
Sat May 30 11:02:15 EDT 2015

On Fri, May 29, 2015 at 2:36 PM, ianG <iang at iang.org> wrote:

> Some comments - on the whole this is a good start!
> On 27/05/2015 22:28 pm, Phillip Hallam-Baker wrote:
>> We use message digests as data fingerprints in lots of places. OpenPGP
>> being the most visible of course but fingerprints are also used in
>> BitCoin, for software distribution and even in S/MIME
>> The OpenPGP group was discussing approaches to a new fingerprint format
>> based on Base32 so that we can squeeze more bits out of the data on a
>> business card. So generalizing a bit, I came up with this:
>> https://tools.ietf.org/html/draft-hallambaker-udf-00
> 2.1.  Last para seems to conflate two issues being the age/replacement and
> the weak/substitution.  Either way we arrive at the same conclusion, that
> the fingerprint mechanism should include some degree of signal that
> indicates which one it is.
> I think I'd write it somewhat differently, words to effect:
> Fingerprint formats have had several problems in the past.  There has been
> a proliferation of formats which has led to a potential confusion between
> the algorithm to be used on a particular format.  In particular, where an
> algorithm has also become weak, such as MD5, it is possible to do a
> substitution attack.
> Therefore, representations MUST reserve the first 5 bits as an algorithm
> identifier Section 3.1.1.

Yes, had difficulty drafting that bit. At this point I am keeping the
architecture requirements and the implementation completely separate since
other folk may disagree with either (and have reason). If I have left out
an important requirement it might require fundamental changes to the

At this point the consensus on the OpenPGP list seemed to be for reserving
a byte for version ID.

Putting the MIME content type in the scope of the digest means that if
>> the same data string has meaning in two different contexts, an attacker
>> can't perform a substitution attack. It also means that whoever is
>> interpreting the hash has to know the context in which the data is being
>> used.
> I'm a bit disturbed by the MIME content type but I can't quite put my
> finger on what's the difficulty.  One thing that might help is to define a
> default type in the words of the text that means "no information/context is
> implied."
> Eg,
>     An empty Content-ID can be used if no MIME content is to
>     be delivered, but the colon ':' must always be present.
> It would also be helpful (to me?) to specify some basic MIME types. E.g.,
> things like:
>     The following MIME types are reserved:
>        text/plain
>        openpgp-v5-key
>        SMIME-v1
>        openpgp-v4-cleartext-signed
> etc (just making it up as I go).

Yes, was just being lazy with the content-type declarations :)

>  The fingerprint is base32 encoded and set in chunks of 5 characters for
>> easier reading/verification. The precision is always a multiple of 25
>> bits using simple truncation:
>> 100 bits - MB2GK-6DUF5-YGYYL-JNY5E
>> 150 bits - MB2GK-6DUF5-YGYYL-JNY5E-RWSHZ-SV75J
>> The version/algorithm identifier also defines the algorithm used. The
>> predefined identifiers are 96 for SHA-2-512 and 144 for SHA-3-512. These
>> produce mnemonics for 'Merkle' and 'Spongeworthy'
> Can I suggest that M and S be output as m and s?  In this way we signal to
> the eye more easily what is going on:
>     mB2GK-6DUF5
> (and a caveat that it is a typographical convention only, it is case
> independent, and implementations must accept leading upper case).

Interesting idea. I don't think there is going to be much need for this
however as I don't expect to add a third value for a decade at least,
probably longer. I don't think the typical user needs to know which one is
used. The only exception would be trying to use an Sxxxx- fingerprint on a
machine that does not implement SHA-3-512 which is pretty common right now
as the FIPS isn't published.

I'm unsure about section 4.  What's the point in just talking about it?
> E.g., if we want a word list, why not introduce it, just copy the PGP word
> list into an appendix and provide some text as to how it works.

PGP Word List is one possible choice, but it was designed almost 20 years
ago when memory etc. were very scarce and it is designed to be read out
over a VOIP connection. Those constraints limit it to an 8 bit encoding.
The 16 bit encoding I have someone working on would require half the number
of words to give the same strength.

At this point I don't want to make a fixed choice. It would have to be
internationalizable in any case.

Another similar issue is curating an image alphabet for the same purpose.
Back in 1995 this approach would have to use a very limited number of
images because we didn't have ubiquitous networking and we didn't have the
option of looking images up.

Another very important change is that problems that require a lot of
unspecialized effort may be more practical than problems requiring a small
amount of highly specialist knowledge. Finding five people to write (good)
crypto code is hard. Finding a thousand people to crowdsource an image
library might well be easier.

If the image library is stored on a bunch of servers round the world and
authenticated under a Merkle tree, a connected device can convert a
fingerprint to images very quickly.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.metzdowd.com/pipermail/cryptography/attachments/20150530/24bb3d55/attachment.html>

More information about the cryptography mailing list