# Private Key Generation from Passwords/phrases

John Denker jsd at av8n.com
Thu Jan 18 13:03:38 EST 2007

```On 01/17/2007 06:07 PM, Allen wrote:

> The whole issue of entropy is a bit vague for me - I don't normally work
> at that end of things - so could you point to a good tutorial on the
> subject, or barring having a reference handy, could you give an overview?

Entropy is defined in terms of probability.

The /definition/ of entropy is

sum_i  P_i log(1/P_i)             [1]

there the sum runs over all symbols (i) in the probability
distribution, i.e. over all symbols in the ensemble.

Equation [1] is the gold standard.  It is always correct.  Any
other expression for entropy is:
a) equivalent to [1]
b) a corollary, valid under some less-than-general conditions, or
c) wrong.

We have not yet specified the _base_ of the log in equation [1].
If the log is base 2, the entropy is measured in bits.
If the log is base 3, the entropy is measured in nats.
For more on this, including the connection to macroscopic
thermodynamics, see
http://www.av8n.com/physics/thermo-laws.htm#sec-s-units

One toss of a fair coin is 1 bit of entropy.
Two tosses of a fair coin is 2 bits of entropy.

Entropy can be measured in bits, but not everything measured
in bits is entropy (just as milk can be measured in gallons,
but also sewage can be measured in gallons).  In particular,
a 32 bit word may hold less than 32 bits of entropy, but it
cannot hold more.

A related notion is the /surprise value/ of a given symbol,
namely
\$_i = log(1/P_i)

We immediately see that the entropy is the appropriately
weighted average of the surprise value.  It must be emphasize
that entropy is a property of the whole ensemble (in contrast
to the surprise value which is a property of an individual
symbol).  There is no such thing as the "average entropy";

There are two branches of cryptography.
-- One branch depends on entropy for security.
-- The other branch depends on computational complexity for
security.

The term /random/ is used in multiple inconsistent ways;  if
you ask 5 experts you can get 10 different definitions.  As
for me, I say something is random if it is
_random enough for the application_.  I am *not* correspondingly
tolerant about the term entropy.  Entropy is entropy.  There is
only one definition of entropy, used uniformly across a wide
range of disciplines including
1. cryptography and cryptanalysis (secret codes)
2. communications (error-correcting codes, as part of electronic
engineering)
3. computer science, including data-compression codes, machine
learning, speech recognition, etc.
4. librarianship
5. the physics of computation
6. the design of refrigerators, heat pumps, and engines (including
piston, turbine, and rocket engines)
7. nuclear engineering (reactors and weapons)
8. fluid dynamics
9. astrophysics and cosmology
10. chemistry and chemical engineering

If you're not defining it in terms of equation [1] or something
provably equivalent, don't call it entropy.

In particular, 200 bits of pseudorandomness is not the same as 200
bits of entropy.  If you mean entropy, say entropy;  if you mean
randomness, say randomness.  Sometimes pseudorandomness based on
computational complexity is suitable for the application, and
sometimes it doesn't make sense to use anything less than genuine
entropy.  For more on how to generate usable entropy, see
http://www.av8n.com/turbid/paper/turbid.htm

For more about the definition of entropy, including the connection
to thermodynamics, see
http://www.av8n.com/physics/thermo-laws.htm#sec-entropy

---------------------------------------------------------------------
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to majordomo at metzdowd.com

```