[Cryptography] canonicalizing unicode strings.

John Levine johnl at iecc.com
Tue Jan 16 21:00:35 EST 2018


In article <db71e4ae-c4cd-ec2e-8ed3-538b4d97ad90 at echeque.com> you write:
>On 16/01/2018 13:30, John Levine wrote:
>> I think the normal approach is to accept strings only in a single
>> script.  Mixed scripts are generally malicious in any sort of
>> identifier context.
>
>What, however, is a script?

It's a set of characters used to write one or more languages.  Unicode
defines about 100 of them.  Familiar examples are Latin, Greek,
Cyrillic, and Han (simplified.)  More details here:

https://en.wikipedia.org/wiki/Script_(Unicode)

I think the meta-lesson here is that if you want do do something with
Unicode, you really have to spend a few minutes learning how it works,
first.

R's,
John


More information about the cryptography mailing list