[Cryptography] A question on Shannon's entropy

Thu Mar 3 03:02:06 EST 2016

Am 03.03.2016 um 04:37 schrieb Thierry Moreau:
> On 02/03/16 11:31 PM, mok-kong shen wrote:
>>
>> Given 2 alphabetical sequences A and B of the same length. If A
>> has higher entropy than B, then (if I don't err) this is commonly
>> interpreted in crypto to imply that A is more random than B. Now
>> suppose A has entropy u. Sort A to result in C such that all a's
>> are in front of all b's etc. Since the frequencies of the characters
>> are not affected by the sorting, C has also entropy u according to
>> Shannon's formula. However C is evidently less random than A. How
>> could one resolve this prolem?
>
> Wrong application of Shannon lessons.
>
> An entropy measure is tied to a data source that can emit codes without
> a size limit for the output.
>
> In your question A and B are not data sources. If they were to be
> equated to a data source, the respective codes would have to be defined,
> e.g. as "alphabetic sequences of length-of-A" for A, and then sorting a
> sample from A creates a very different data source.
>
> Said otherwise, the Shanon entropy of a sequence makes no sense. You
> must make assumptions of the data source behind the sequence and analyze
> the data source itself with the Shanon lessons.

But in order to measure the randomness in a source, one has to
take samples. These samples are necessarily strings of finite
sizes in practice. Now suppose samples from one source looks quite
random and samples from the other source happen to be (approximately)
the sorted sequences of the first. In that case I surmise that there
is yet some problem.

M. K. Shen