[Cryptography] Typesetting vs. identifiers, On those spoofed domain names...

Tue Mar 20 12:01:15 EDT 2018

On Mon, Mar 19, 2018 at 04:48:44PM +0000, John Levine wrote:
> In article <20180319155456.GA7255 at localhost> you write:
> >> Also right.  If you bother to learn about normalization and IDNA and
> >> PRECIS and label generation rules, you can come up with usable subsets
> >> of Unicode for identifiers.  It's not perfect as the Krebs article
> >> rediscovered; you can't totally avoid homoglyphs, but that is not new
> >> (MICR0SOFT and paypaI) and there are ways to mitigate the damage [ with scripts
> and profiles].
> 
> >A list of sets of homoglyphs is feasible, but it will take time to come
> >up with something remotely complete. ...
> 
> I do not think that is true.  Look at the way that characters are
> composed in many Asian languages and you quickly run into an
> exponential explosion of things to compare to see if they look the
> same, and sameness that changes depending on the typefaces you're
> using.

Yes, but.  Either fluent CJK users have less trouble confusing CJK
glyphs, or they have as much trouble as non-CJK users might expect but
the rest of us non-CJK users are unaffected by that anyways :)

I suspect that for non-ideographic/hieroglyphic scripts the
confusability problem is closer to that which we find in {Latin, Greek,
Cyrillic}, and that it's actually feasible to identify sets of
confusable glyphs among such scripts.

If someone attempts to phish me with CJK confusables, I'll recognize...
that I can't read them and move on.  If someone attempts to phish me
with Cyrillic confusables, I might well be at a font's mercy.

I don't think one needs to look for confusables in {Latin, CJK} --
there won't be very any/many.  But it's much more likely that looking
for {Latin, Greek, Cyrillic} confusables in registrations would help
speakers of European languages.  Forbidding mixing of scripts in one
label is not likely to be feasible (e.g., in South Korea it's becoming
common to add "ing" to Hangul words).

I don't think there is zero value in collating a set of sets of
confusable [non-CJK] glyphs.

There may not be *enough* value in that: there's still typo-squatting
and other such attacks to worry about, so it feels like a losing battle.

But it's not yet clear to _me_ that there is insufficient value in
identifying sets of confusable glyphs.

Perhaps that is clear to _you_, but you may need to make an argument
that doesn't involve CJK to convince me :/

> The point of scripts and language profiles is that people have a
> pretty good intuition of what characters a language uses and how you
> can combine them and still make sense.  [...]

This is also why different registries should be able to have different
policies.

>                                 [...].  Also remember that input
> methods are not standardized, the way you switch scripts in the middle
> of a word really really isn't standardized even where it's possible,
> and a name isn't very useful if you can't figure out how to type it.

Of course, but an attacker won't make you type it.  They'll hope you
click.

Nico
--