Entropy of other languages

Sandy Harris sandyinchina at gmail.com
Wed Feb 7 08:42:49 EST 2007

Allen <netsecurity at sound-by-design.com> wrote:

> An idle question. English has a relatively low entropy as a
> language. Don't recall the exact figure, but if you look at words
> that start with "q" it is very low indeed.
> What about other languages? Does anyone know the relative entropy
> of other alphabetic languages? What about the entropy of
> ideographic languages? Pictographic? Hieroglyphic?

The most general answer is in a very old paper of Mandelbrot's.
Sorry, I don't recall the exact reference or have it to hand.

He starts from information theory and an assumption that
there needs to be some constant upper bound on the
receiver's per-symbol processing time. From there, with
nothing else, he gets to a proof that the optimal frequency
distribution of symbols is always some member of a
parameterized set of curves.

Pick the right parameters and Mandelbrot's equation
simplifies to Zipf's Law, the well-known rule about
word, letter or sound frequencies in linguistics.
I'm not sure if you can also get Pareto's Law which
covers income & wealth distributions in economics.

Sandy Harris
Quanzhou, Fujian, China

