[Cryptography] canonicalizing unicode strings.
John Levine
johnl at iecc.com
Tue Jan 16 21:00:35 EST 2018
In article <db71e4ae-c4cd-ec2e-8ed3-538b4d97ad90 at echeque.com> you write:
>On 16/01/2018 13:30, John Levine wrote:
>> I think the normal approach is to accept strings only in a single
>> script. Mixed scripts are generally malicious in any sort of
>> identifier context.
>
>What, however, is a script?
It's a set of characters used to write one or more languages. Unicode
defines about 100 of them. Familiar examples are Latin, Greek,
Cyrillic, and Han (simplified.) More details here:
https://en.wikipedia.org/wiki/Script_(Unicode)
I think the meta-lesson here is that if you want do do something with
Unicode, you really have to spend a few minutes learning how it works,
first.
R's,
John
More information about the cryptography
mailing list