[Cryptography] canonicalizing unicode strings.

jamesd at echeque.com jamesd at echeque.com
Wed Jan 31 04:04:18 EST 2018


On 31/01/2018 08:06, Nico Williams wrote:
> Algorithms for detection of homoglyph identifiers that match existing
> ones is a more urgent need.
Attempts to restrict people to only using one script in an identifier 
are not going to fly, but if someone uses more than one script, we need 
to check against all potentially conflicting identifiers for homoglyphs.

To efficiently check for homoglyphic identifiers, have to canonicalize 
all homoglyphs -
1 🠚 l,
- 🠚 –
− 🠚 –
0 🠚 O
ο 🠚 o

I suspect there is a zillion of them.


And no official list of homoglyphs, or official software to canonicalize 
them.

Need to write a program that prints them out to an image file, then 
automatically searches the image file for near matches - which will not 
correctly spot anything that would look like a match to a human, but 
will get a lot of them.  A human would have to do final cleanup on the 
output of the software.


More information about the cryptography mailing list