[Cryptography] canonicalizing unicode strings.

John Levine johnl at iecc.com
Mon Jan 15 01:32:51 EST 2018


In article <b9a92033-1d13-6780-4f4d-472e1e111343 at echeque.com> you write:
>Is there somewhere a list of near duplicate unicode symbols, or existing 
>canonicalization code?

Ooh, you've cracked open a large economy size can of worms.

Unicode has four defined normalization forms, all of which are broken
in some way:

https://www.unicode.org/reports/tr15/

I'd guess you want to use form KC but without knowing more about your
application, it's just a guess.

R's,
John


More information about the cryptography mailing list