Private Key Generation from Passwords/phrases

Fri Jan 19 03:11:40 EST 2007

At 01:55 PM 1/18/2007, John Denker wrote:
>We would be better off maintaining just the one technical
>definition of entropy, namely S = sum_i P_i log(1/P_i).
>If you want to talk about something else, call it something
>else ... or at least make it clear that you are using the
>term in a nontechnical or metaphorical sense.
>...
>If you want to talk about work factor, call it work factor.
>If you want to talk about entropy, call it entropy.

One of the roots of the problem is that for many applications,
i is a well-defined event and P(i) is a fixed value (for i) ,
but for many other applications,
i might not be a well-defined event, and/or
P(i) is really a conditional probability, P(i|other-stuff-you-know),
and it's hard to tell whether that's
usefully different from the non-conditional P(i).

One case where this comes up is key generation with entropy pools -
you take a bunch of hopefully-kinda-independent,
hopefully-identically-distributed variables generated
by processes that are complicated enough to look random,
make some estimates of their probability distributions
and hence of their entropy, and add the estimates together,
after prewhitening the samples to make any correlation
harder to exploit.  This gives you N bits of entropy,
and you take M bits of it to use as a key,
again possibly with some hash functions to make
calculations more difficult.

So how many bits of entropy are left?
You can be conservative and call it N-M,
assuming that if somebody were clever enough to take
the M bits key sample they could validate the rest
with only N-M calls to an oracle,
or you could be wildly optimistic and call it N,
assuming that the conditional probabilities of the
remaining pool are still the same given that you know the M bits,
because there's no way to use them to validate anything,
and if you've done a good enough job of designing your
pool management functions the reality is probably
closer to the wildly optimistic case,
unless somebody finds an efficient way to invert your hash functions,
at which point the conditional probabilities are actually
different if you know the M key bits than if you don't.

Another entropy example was the Venona decryptions -
people banging "randomly" on typewriters didn't actually produce
independent or identically distributed letters,
so the conditional probabilities didn't actually match
the assumed ones, so the entropy estimates were wrong,
and human language plaintext being what it is,
they really needed the 1-bit-per-bit of key entropy.
It's much tougher to exploit than OTPs used more than once,
but it's a start.

But hey, nobody's going to guess that your password
is your dog's name spelled backwards, or think of using
a list of a few million obvious passwords as input to
the key-generation function.

---------------------------------------------------------------------
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to majordomo at metzdowd.com