[Cryptography] canonicalizing unicode strings.
jamesd at echeque.com
jamesd at echeque.com
Wed Jan 31 04:04:18 EST 2018
On 31/01/2018 08:06, Nico Williams wrote:
> Algorithms for detection of homoglyph identifiers that match existing
> ones is a more urgent need.
Attempts to restrict people to only using one script in an identifier
are not going to fly, but if someone uses more than one script, we need
to check against all potentially conflicting identifiers for homoglyphs.
To efficiently check for homoglyphic identifiers, have to canonicalize
all homoglyphs -
1 🠚 l,
- 🠚 –
− 🠚 –
0 🠚 O
ο 🠚 o
I suspect there is a zillion of them.
And no official list of homoglyphs, or official software to canonicalize
them.
Need to write a program that prints them out to an image file, then
automatically searches the image file for near matches - which will not
correctly spot anything that would look like a match to a human, but
will get a lot of them. A human would have to do final cleanup on the
output of the software.
More information about the cryptography
mailing list