[Cryptography] canonicalizing unicode strings.
Howard Chu
hyc at symas.com
Mon Jan 15 00:04:20 EST 2018
jamesd at echeque.com wrote:
> I would like strings that look similar to humans to map to the same item.
> Obviously trailing and leading whitespace needs to go, and whitespace map a
> single space.
>
> The hard part, however is that unicode has an enormous number of near
> duplicate symbols.
>
> Is there somewhere a list of near duplicate unicode symbols, or existing
> canonicalization code?
Have you already read https://www.unicode.org/reports/tr15/tr15-45.html ?
Our normalization code is in
http://www.openldap.org/devel/gitweb.cgi?p=openldap.git;a=tree;f=libraries/liblunicode;h=4896a6dc9ee5d3e78c15ed6c2e2ed2f21be70247;hb=HEAD
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
More information about the cryptography
mailing list