<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Fri, May 29, 2015 at 2:36 PM, ianG <span dir="ltr"><<a href="mailto:iang@iang.org" target="_blank">iang@iang.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Some comments - on the whole this is a good start!<span class=""><br>

<br>

<br>

On 27/05/2015 22:28 pm, Phillip Hallam-Baker wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

We use message digests as data fingerprints in lots of places. OpenPGP<br>

being the most visible of course but fingerprints are also used in<br>

BitCoin, for software distribution and even in S/MIME<br>

<br>

The OpenPGP group was discussing approaches to a new fingerprint format<br>

based on Base32 so that we can squeeze more bits out of the data on a<br>

business card. So generalizing a bit, I came up with this:<br>

<br>

<a href="https://tools.ietf.org/html/draft-hallambaker-udf-00" target="_blank">https://tools.ietf.org/html/draft-hallambaker-udf-00</a><br>

</blockquote>

<br>

<br></span>

2.1.  Last para seems to conflate two issues being the age/replacement and the weak/substitution.  Either way we arrive at the same conclusion, that the fingerprint mechanism should include some degree of signal that indicates which one it is.<br>

<br>

I think I'd write it somewhat differently, words to effect:<br>

<br>

Fingerprint formats have had several problems in the past.  There has been a proliferation of formats which has led to a potential confusion between the algorithm to be used on a particular format.  In particular, where an algorithm has also become weak, such as MD5, it is possible to do a substitution attack.<br>

<br>

Therefore, representations MUST reserve the first 5 bits as an algorithm identifier Section 3.1.1.</blockquote><div><br></div><div>Yes, had difficulty drafting that bit. At this point I am keeping the architecture requirements and the implementation completely separate since other folk may disagree with either (and have reason). If I have left out an important requirement it might require fundamental changes to the implementation.</div><div><br></div><div>At this point the consensus on the OpenPGP list seemed to be for reserving a byte for version ID.</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class=""><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Putting the MIME content type in the scope of the digest means that if<br>

the same data string has meaning in two different contexts, an attacker<br>

can't perform a substitution attack. It also means that whoever is<br>

interpreting the hash has to know the context in which the data is being<br>

used.<br>

</blockquote>

<br>

<br></span>

I'm a bit disturbed by the MIME content type but I can't quite put my finger on what's the difficulty.  One thing that might help is to define a default type in the words of the text that means "no information/context is implied."<br>

<br>

Eg,<br>

<br>

    An empty Content-ID can be used if no MIME content is to<br>

    be delivered, but the colon ':' must always be present.<br>

<br>

It would also be helpful (to me?) to specify some basic MIME types. E.g., things like:<br>

<br>

    The following MIME types are reserved:<br>

       text/plain<br>

       openpgp-v5-key<br>

       SMIME-v1<br>

       openpgp-v4-cleartext-signed<br>

<br>

etc (just making it up as I go).</blockquote><div><br></div><div>Yes, was just being lazy with the content-type declarations :)</div><div><br></div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class=""><br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

The fingerprint is base32 encoded and set in chunks of 5 characters for<br>

easier reading/verification. The precision is always a multiple of 25<br>

bits using simple truncation:<br>

<br>

100 bits - MB2GK-6DUF5-YGYYL-JNY5E<br>

<br>

150 bits - MB2GK-6DUF5-YGYYL-JNY5E-RWSHZ-SV75J<br>

<br>

<br>

The version/algorithm identifier also defines the algorithm used. The<br>

predefined identifiers are 96 for SHA-2-512 and 144 for SHA-3-512. These<br>

produce mnemonics for 'Merkle' and 'Spongeworthy'<br>

</blockquote>

<br>

<br>

<br></span>

Can I suggest that M and S be output as m and s?  In this way we signal to the eye more easily what is going on:<br>

<br>

    mB2GK-6DUF5<br>

<br>

(and a caveat that it is a typographical convention only, it is case independent, and implementations must accept leading upper case).</blockquote><div><br></div><div>Interesting idea. I don't think there is going to be much need for this however as I don't expect to add a third value for a decade at least, probably longer. I don't think the typical user needs to know which one is used. The only exception would be trying to use an Sxxxx- fingerprint on a machine that does not implement SHA-3-512 which is pretty common right now as the FIPS isn't published.</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">I'm unsure about section 4.  What's the point in just talking about it?  E.g., if we want a word list, why not introduce it, just copy the PGP word list into an appendix and provide some text as to how it works.</blockquote><div><br></div><div>PGP Word List is one possible choice, but it was designed almost 20 years ago when memory etc. were very scarce and it is designed to be read out over a VOIP connection. Those constraints limit it to an 8 bit encoding. The 16 bit encoding I have someone working on would require half the number of words to give the same strength.</div><div><br></div><div>At this point I don't want to make a fixed choice. It would have to be internationalizable in any case.</div><div><br></div><div>Another similar issue is curating an image alphabet for the same purpose. Back in 1995 this approach would have to use a very limited number of images because we didn't have ubiquitous networking and we didn't have the option of looking images up.</div><div><br></div><div>Another very important change is that problems that require a lot of unspecialized effort may be more practical than problems requiring a small amount of highly specialist knowledge. Finding five people to write (good) crypto code is hard. Finding a thousand people to crowdsource an image library might well be easier.</div><div><br></div><div>If the image library is stored on a bunch of servers round the world and authenticated under a Merkle tree, a connected device can convert a fingerprint to images very quickly.</div></div></div></div>