[Cryptography] canonicalizing unicode strings.

Tue Jan 30 22:01:48 EST 2018

In article <20180131000626.GI10954 at localhost> you write:
>On Tue, Jan 30, 2018 at 12:18:08PM -0500, John Levine wrote:
>> Unicode is a language for typesetting.  Their goal is that any text in
>
>Well, a small portion of typesetting (there's no way to format text, no
>way to control positioning on a page, etc.). ...

I think we agree here.  Human language is complex, anything that
adequately represents it isn't going to be much less complex than
Unicode.

>There's no point saying that this "makes Unicode a lousy base to use for
>identifiers and passwords".

You could have a design that was less enthusiastic about multiple ways
to represent the exact same character, e.g., one with only precomposed
versions, or one with no precomposed versions.  But as we agree, we've
got what we've got.

>And for passwords homoglyphs are mostly a non-issue.  The primary issue
>for passwords is the user's ability to enter them correctly on all their
>devices.  And, of course, normalization is kinda required, since the
>user generally has no control over pre-composition choices of the input
>method.

Right, the models for passwords and identifiers are different.  For an
identifier, you want everyone who sees it printed on a business card
or on a bus to be able to enter it, while for passwords it's only one
person so it's a matter of remember what you did so you can do it
again.  There's no reason to think that the script rules for DNS IDNs
would be appropriate for any other context.*

R's,
John

* although I sure have spent a lot of time telling people, no, that's
not going to be used as DNS name, you don't want to try to turn it
into punycode.