[Cryptography] canonicalizing unicode strings.
Peter Todd
pete at petertodd.org
Mon Jan 15 11:12:08 EST 2018
On Sun, Jan 14, 2018 at 08:46:34PM -0800, Ray Dillinger wrote:
>
>
> On 01/14/2018 03:19 AM, jamesd at echeque.com wrote:
> > I would like strings that look similar to humans to map to the same
> > item. Obviously trailing and leading whitespace needs to go, and
> > whitespace map a single space.
> >
> > The hard part, however is that unicode has an enormous number of near
> > duplicate symbols.
> >
> > Is there somewhere a list of near duplicate unicode symbols, or existing
> > canonicalization code?
>
>
>
> Yes there is. This file summarizes known unicode homoglyphs and
> near-homoglyphs.
>
> http://www.unicode.org/Public/security/revision-03/confusablesSummary.txt
If possible I always recommend using a whitelist rather than the blacklist
approach shown above, which will inevitably get out of date as new unicode
homoglyphs and near-homoglyphs get added to unicode.
--
https://petertodd.org 'peter'[:-1]@petertodd.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: Digital signature
URL: <http://www.metzdowd.com/pipermail/cryptography/attachments/20180115/b42275f8/attachment.sig>
More information about the cryptography
mailing list