Entropy Definition (was Re: passphrases with more than 160 bits of entropy)

John Kelsey kelsey.j at ix.netcom.com
Sat Mar 25 12:57:33 EST 2006


>From: John Denker <jsd at av8n.com>
>Sent: Mar 24, 2006 11:57 AM
>To: Erik Zenner <ez at cryptico.com>, cryptography at metzdowd.com
>Subject: Re: Entropy Definition (was Re: passphrases with more than 160 bits of entropy)

>Erik Zenner wrote:
>
>>>0 occurs with probability 1/2
>>>each other number from 1 to 2^{160}+1 happens with 
>>>probability 2^{-161}.
>
>>  Is anyone aware of whether (and where) this was 
>> discussed in the literature, or what other approaches are taken?
>
>This particular problem is contrived or at least exaggerated.  

The example is a contrived, pathological case.  And there are a bunch
of easy solutions once you know this is the distribution.  The point
is to demonstrate that Shannon entropy gives you the wrong answer when
you try to use it here.

Now, you might ask if this is a problem in a real-world setting.  So
let's imagine a very simple distribution--sequences of flips of a
biased coin.  There are nice ways to remove the bias, but let's
imagine we're not doing that--instead, we're going to take a sequence
of coin flips, turn it into a 0/1 sequence, and then hash it down to
get a key.  

Suppose the coin is biased so that heads comes up 0.9 of the time, and
that we generate 16-bit sequences at a time.  The Shannon entropy of a
16-bit sequence is about 7.5, but the most likely symbol (all heads)
comes up about 18% of the time.  So if we tried to generate a 7-bit
"key", we'd lose on the attacker's first guess 18% of the time. 

So, let's go further with this.  We want to generate a DES key, with
56 bits of entropy.  A 64-bit sequence produced with this biased coin
has Shannon entropy of about 60 bits, but an attacker has about a
1/1000 chance of guessing the DES key we derive from it on his first
try, which is unacceptable for just about any crypto application.
(The min-entropy is about ten bits.)

So yes, I used a contrived example to demonstrate the problem, but no,
the problem isn't just an ivory-tower concern.

The intuition here is that Shannon entropy is concerned with
communications channels, where we assume we have to send every
symbol.  So when you have lots of low-probability symbols, and one of
those low-probability symbols is chosen 999 times out of a 1000, the
amount of bandwidth you need to transmit those symbols easily becomes
dominated by them.  Most of the time, you're sending one of a large
set of symbols, and they all have to be distinguished from one
another.  

The situation for the attacker is a lot more like having a channel
where it's acceptable to only send a small fraction of those symbols,
and just drop the rest.  That is, it's okay for the attacker's model
to just ignore the huge set of low-probability symbols that occur
999/1000 of the time, and instead just transmit the highest
probability symbol 1/1000 of the time.  Instead of transmitting it,
he's just guessing it.  When he gets it right, he learns your DES
key.  

...
>This version serves to illustrate, in an exaggerated way, the necessity
>of not assuming that the raw data words are IID (independent and identically
>distributed).

Yes!  The "independent" part is much harder to deal with than the
per-symbol distribution, in many real-world applications.  The worst
of these are operating system events (used in Windows and various
Unixes to get random bits).  Peter Gutmann did some really nice work
on this on a bunch of operating systems in a Usenix paper, and he
updated that and has a version of it on his website.  If you're doing
OS-event sampling to generate random bits, you really ought to look at
his stuff.  (But if you can get some hardware support, either directly
or via sampling the microphone like the Turbid design does, you're
probably on much firmer ground.)  

--John Kelsey, NIST


---------------------------------------------------------------------
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to majordomo at metzdowd.com



More information about the cryptography mailing list