[Cryptography] canonicalizing unicode strings.

Nico Williams nico at cryptonector.com
Tue Jan 30 23:24:16 EST 2018


On Tue, Jan 30, 2018 at 10:01:48PM -0500, John Levine wrote:
> >There's no point saying that this "makes Unicode a lousy base to use for
> >identifiers and passwords".
> 
> You could have a design that was less enthusiastic about multiple ways
> to represent the exact same character, e.g., one with only precomposed
> versions, or one with no precomposed versions.  But as we agree, we've
> got what we've got.

My sense (I wasn't there and I haven't researched what happened) is that
precomposition exists to make transcoding with ISO-8859 simpler, and to
make rendering easier for Hangul, though it might well also be useful
for other scripts too but that's beyond the limit of my knowledge of
Unicode.

Decomposition is clearly the better approach!

But even with only decomposed representations we'd have a normalization
problem for characters that involve more than three codepoints.

So I don't even resent normalization.  I've accepted it.  It's just a
part of life.  Ditto homoglyphs.  Processing text is hard; we're not in
ASCII-land anymore.

> >And for passwords homoglyphs are mostly a non-issue.  The primary issue
> >for passwords is the user's ability to enter them correctly on all their
> >devices.  And, of course, normalization is kinda required, since the
> >user generally has no control over pre-composition choices of the input
> >method.
> 
> Right, the models for passwords and identifiers are different.  For an
> identifier, you want everyone who sees it printed on a business card
> or on a bus to be able to enter it, while for passwords it's only one
> person so it's a matter of remember what you did so you can do it
> again.  There's no reason to think that the script rules for DNS IDNs
> would be appropriate for any other context.*

Bingo.  There's rules for IDNA, and then there are rules that registrars
should apply, which is not the same thing.  The DNS itself needs to
allow homoglyphs if for no other reason than that servers can't be
expected to do anything about them and neither would clients (though
user agents are another story); but registrars need to look out for
their customers.

> * although I sure have spent a lot of time telling people, no, that's
> not going to be used as DNS name, you don't want to try to turn it
> into punycode.

I... rather like punycode, oddly.  It would have been good to re-use it
for email mailbox I18N.

Nico
-- 


More information about the cryptography mailing list