[Cryptography] randomness +- entropy

Tue Nov 5 11:51:50 EST 2013

On 11/05/2013 05:18 AM, Jerry Leichter wrote:

>> Entropy means something very special.

> Well, yes, and in its technical definitions - 

Are we not having a technical discussion?  If not, then what?

> either in
> thermodynamics where it arose or information theory where it was
> imported because of a similarity in formalisms 

The physics entropy and the information-theory entropy are the
same thing.  This is not a mere "similarity in formalisms".

For example, it is a one-line calculation to find the entropy
of the nuclear spins in a sample of copper, starting from
statistical principles, namely

     S = R ln 4

and you can also measure S using a physical thermometer.
Mirabile dictu, you get the same answer either way.

> - it plays virtually
> no role in cryptographic discussion. 

Nonsense!

>  In cryptography, especially
> when discussing random number generators, the word has a loosy-goosy
> meaning

Nevermind the word, ideas are what's important.  Terminology is
important only insofar as it helps formulate and communicate the
ideas.

The /idea/ of entropy has tremendous significance.  If we didn't
call it "entropy" we would need to invent another name for it.  So
please let's save everybody a lot of trouble and call it "entropy".

>  that's somehow tied to lack of predictability - but in an
> unspecified way.

Yes, "tied to" ... but that's not the same as "same as".  Simple
suggestion:
 -- If you mean "unpredictability", say "unpredictability".
 -- If you mean "entropy", say "entropy".

We have perfectly good words for each of these things.

>  When people make it a regular habit to "guess" the
> entropy of various processes and then go on to build systems based on
> guesses, you know the word has lost any formal meaning it ever had.

Speak for yourself, Kemosabe.  As for me, I know of a systematic 
method for estimating the entropy, based on a series of intelligent 
guesses.  The method was described by some guy named Shannon, several 
decades ago.  It is consistent with everything else we know about 
entropy.

================

Successful cryptography often depends on depth of understanding and
attention to detail.  The loosey-goosey approach is very likely to
get you into trouble.

The same idea applies to various other fields of endeavor
  http://www.check-six.com/images/Jenny38057-poster.jpg

There *is* a difference between a TRNG and a cryptographically-strong
PRNG.
  A) In /some/ situations, the difference doesn't matter very much.
  B) In other situations, it matters a great deal.

For starters, 
  A) It possible to imagine a TRNG independent of any PRNG.
  B) It is not possible to imagine a PRNG completely independent of
   any TRNG, because the PRNG needs a seed, and the seed has to
   come from somewhere.

Here's another contrast:
   A) It may be that under /normal/ operating conditions, the /user/
    does not care about TRNG versus PRNG.
   B) Anybody who is /designing/ any such thing needs to understand
    the distinction.  In particular, the /recovery from compromise/ 
    is wildly different in the two cases.

>>  TRNG entropy density = 1 - epsilon
>>  PRNG entropy density =   epsilon

> I don't know what this means.

Here's a highly-condensed tutorial:

Suppose we have N-bit codewords, and various statistical distributions
over codewords.

Entropy is a property of the ensemble (not of any particular codeword).

 1) If the ensemble consists of all 2^N possible codewords, evenly 
  distributed, the distribution has N bits of entropy.

 2) Now consider a different distribution.  If half the codewords are
  missing, and the rest are evenly distributed, the distribution has
  N-1 bits of entropy.

 2a) In the sub-case where you know exactly which codewords are missing,
  this distribution is noticeably less random, less predictable than
  distribution (1).

 2b) In the sub-case where it is computationally infeasible to figure
  out which codewords are missing, this distribution may be -- for a
  wide range of practical purposes -- just as unpredictable as
  distribution (1).  However, it still has less entropy.

 3) For a typical real-world PRNG, we are not talking about one bit of
  entropy going missing.  Almost all of the bits have gone missing!
  If we seed the PRNG with 100 bits and then use it to generate a
  billion bits, then there are 2^999999900 missing codes out of a
  possible 2^1000000000.  That's a lot of missing codes.  Well over
  99.99% of the codes are missing.

  With a TRNG, the situation is reversed.  Ideally there would be no
  missing codes whatsoever.  However, for reasons of computational
  efficiency, a practical TRNG will typically allow a few -- a very
  few -- codes to be missing or under-represented in the ensemble.

  The contrast is extreme:
   A) The cryptographic strength required to hide the missing codes
    when almost all codes are missing, versus
   B) The cryptographic strength required to hide the under-represented
    codes, when there are very few of them.

And on top of that, there is the issue or recovery from compromise.

/dev/urandom is different from /dev/random ... for a reason

There is an important idea here.  If you are not going to call this
idea "entropy", you need to come up with a different name for it.

Bear in mind that practically everybody in the world reserves the
word "entropy" to denote the genuine physical / statistical entropy.
Also note that we have perfectly good words like "randomness" and
"unpredictability" to cover the other cases.