[Cryptography] canonicalizing unicode strings.

Mon Jan 29 23:27:39 EST 2018

On 17/01/2018 10:00, John Levine wrote:
> In article <db71e4ae-c4cd-ec2e-8ed3-538b4d97ad90 at echeque.com> you write:
>> On 16/01/2018 13:30, John Levine wrote:
>>> I think the normal approach is to accept strings only in a single
>>> script.  Mixed scripts are generally malicious in any sort of
>>> identifier context.
>>
>> What, however, is a script?
> 
> It's a set of characters used to write one or more languages.  Unicode
> defines about 100 of them.  Familiar examples are Latin, Greek,
> Cyrillic, and Han (simplified.)  More details here:
> 
> https://en.wikipedia.org/wiki/Script_(Unicode)

Glancing at those details, I am pretty sure that there are always 
legitimate reasons to mix Latin script characters with any other script, 
or imperial Aramaic with Aramaic, etc.

The attribution of characters to a particular script is acknowledged to 
be substantially artificial, capricious, uncertain, and arbitrary.