[Cryptography] canonicalizing unicode strings.
jamesd at echeque.com
jamesd at echeque.com
Mon Jan 29 23:27:39 EST 2018
On 17/01/2018 10:00, John Levine wrote:
> In article <db71e4ae-c4cd-ec2e-8ed3-538b4d97ad90 at echeque.com> you write:
>> On 16/01/2018 13:30, John Levine wrote:
>>> I think the normal approach is to accept strings only in a single
>>> script. Mixed scripts are generally malicious in any sort of
>>> identifier context.
>>
>> What, however, is a script?
>
> It's a set of characters used to write one or more languages. Unicode
> defines about 100 of them. Familiar examples are Latin, Greek,
> Cyrillic, and Han (simplified.) More details here:
>
> https://en.wikipedia.org/wiki/Script_(Unicode)
Glancing at those details, I am pretty sure that there are always
legitimate reasons to mix Latin script characters with any other script,
or imperial Aramaic with Aramaic, etc.
The attribution of characters to a particular script is acknowledged to
be substantially artificial, capricious, uncertain, and arbitrary.
More information about the cryptography
mailing list