[Cryptography] canonicalizing unicode strings.

jamesd at echeque.com jamesd at echeque.com
Sun Jan 14 06:19:18 EST 2018


I would like strings that look similar to humans to map to the same 
item. Obviously trailing and leading whitespace needs to go, and 
whitespace map a single space.

The hard part, however is that unicode has an enormous number of near 
duplicate symbols.

Is there somewhere a list of near duplicate unicode symbols, or existing 
canonicalization code?


More information about the cryptography mailing list