Entropy Definition (was Re: passphrases with more than 160 bits of entropy)

Ed Gerck edgerck at nma.com
Fri Mar 24 13:11:34 EST 2006

Someone mentioned Physics in this discussion and this
was for me a motivation to point out something that
has been forgotten by Shannon, Kolmogorov, Chaitin
and in this thread.

Even though Shannon's data entropy formula looks like an
absolute measure (there is no reference included), the often
confusing fact is that it does depend on a reference. The
reference is the probability model that you assume to fit
the data ensemble. You can have the same data ensemble and
many different (infinite) probability models that fit that
data ensemble, each one giving you a valid but different
entropy value. For example, if a source sends the number "1"
1,000 times in a row, what would be the source's entropy?

Aram's assertion that the "sequence of bytes from 1-256" has
maximum entropy would be right if that sequence came as one of
the possible outcomes of a neutron counter with a 256-byte
register. Someone's assertion that any data has entropy X
can be countered by finding a different probability model that
also fits the data, even if the entropy is higher (!). In short,
a data entropy value involves an arbitrary constant.

The situation, which seems confusing, improves when we realize
that only differences in data entropy can be actually measured,
when the arbitrary constant can be canceled -- if we are careful.

In practice, because data security studies usually (and often
wrongly!) suppose a closed system, then, so to say automatically,
only difference states of a single system are ever considered.
Under such circumstances, the probability model is well-defined
and the arbitrary constant *always* cancel. However, data systems
are not really closed, probability models are not always ergodic
or even accurate. Therefore, due care must be exercised when
using data entropy.

I don't want to go into too much detail here, which results
will be available elsewhere, but it is useful to take a brief
look into Physics.

In Physics, Thermodynamics, entropy is a potential [1].
As is usual for a potential, only *differences* in entropy
between different states can be measured. Since the entropy
is a potential, it is associated with a *state*, not with
a process. That is, it is possible to determine the entropy
difference regardless of the actual process which the system
may have performed, even whether the process was reversible or

These are quite general properties. What I'm suggesting is
that the idea that entropy depends on a reference also applies
to data entropy, not just the entropy of a fluid, and it solves
the apparent contradictions (often somewhat acid) found in data
entropy discussions. It also explains why data entropy seems
confusing and contradictory to use. It may actually be a much
more powerful tool for data security than currently used.

Ed Gerck

[1] For example, J. Kestin, A Course in Thermodynamics, Blaisdell,

The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to majordomo at metzdowd.com

More information about the cryptography mailing list