[Cryptography] On those spoofed domain names...

Fri Mar 9 19:37:12 EST 2018

On Fri, Mar 09, 2018 at 02:50:07PM -0800, Ray Dillinger wrote:
> We've beaten up on the Unicode committee so often, on this list, for the
> lookalike characters that mislead humans, and the alternate encodings
> that break hashes, and the alternate codepoint sequences for the same
> character that screw with any search for substrings, and .... it just
> goes on and on.
> 
> We don't have to beat up on them again.  We really shouldn't.  And yet,
> I just can't look at krebs' article without comment.

We should stop, yes.

> An this, in my estimation, is a big design failure on the part of the
> Unicode committee.  The perfectly reasonable impulse that drove it was
> an inevitable interpretation of their mission, but "design by
> accumulation" is not design.  It produces piles, not structures.  And
> Unicode is a pile.

That's not what happened.

What happened is that human scripts and human politics are not simple,
and precluding all homoglyphs was a) never part of the UC's mission, b)
never plausibly and politically going to be part of the UC's mission.
Yes, CJK unification was a thing, but only for CJK, and it failed
politically.

We were always going to have a confusability problem anyways because of
typos and font confusability issues.  The problem isn't that the UC
didn't prevent confusability (it couldn't have).  It's that the
community didn't recognize the problem and write code and standards for
registries/registrars that would make it easier to cope with the
problem.

There's no need to cry over this.  Instead we need to demand that
registrars prevent registration of domains that are typo-, font-, and/or
homoglyph-confusable.  We also need to write code that does fuzzy
confusable matching.

> Data gathering is the first part of good design, but, impatient for
> results, they made the mistake of doing the data gathering only and
> slapping the term 'standard' on page after page of mappings that should
> have been considered to only be the notes outlining their scope.

This may be true, but it's also almost certainly true that they couldn't
wait to learn twenty years' worth of lessons in order to design a
bug-free Unicode that can encode hundreds of scripts, nor could those
lessons have been learned without actually shipping standards and code
during those decades.  A standard like Unicode cannot be born complete,
cannot ever be complete, and can only grow organically.

The alternative to Unicode is what we had before: a mess of smaller,
legacy character sets that could only encode one or two scripts, codeset
conversions galore, and forever-unhappy users.

> They'll never finish a standard.  They don't even want to anymore.
> They'll still be working on that thing a hundred years from now, and

Correct: because human scripts are *still evolving*!  How could it be
otherwise?

> they're even now promoting a view of characters and language that
> justifies continuing to work on it forever, instead of finishing it,
> using it for the several centuries it'll take until significant reasons
> develop why it's not working, and making a new standard then.
> 
> Adults are wasting their time cataloging new poo emoji that someone
> invents every week and forgets a year later, because "language is
> unbelievably complex...."  And they'll do it forever.
> 
> Unicode now contains characters that no one, ever, will need to write
> except to document Unicode.  And every one of them is a security risk to
> the extent that it can be confused with any of the others.  That's
> stupid design.

Unicode includes characters needed to encode ancient texts.  What's
wrong with that?

Dead languages and scripts are not entirely dead.  The need to be able
to express those in Unicode is real.

The UC's mission is not first and foremost to make the DNS better.
Never was.  Never could have been.

Nico
--