In all the talk of super computers there is not...

Thu Sep 6 13:43:29 EDT 2007

On Thu, 6 Sep 2007 09:28:40 -0400 (EDT)
"Leichter, Jerry" <leichter_jerrold at emc.com> wrote:

> | Hi Martin,
> | 
> | I did forget to say that it would be salted so that throws it off by
> | 2^12
> | 
> | A couple of questions. How did you come up with the ~2.5 bits per
> | word? Would a longer word have more bits?
> He misapplied an incorrect estimate!  :-) The usual estimate - going
> back to Shannon's original papers on information theory, actually - is
> that natural English text has about 2.5 (I think it's usually given as
> 2.4) bits of entropy per *character*.  There are several problems
> here:

It's less than that.  See, for example, the bottom of the first page of
http://www.cs.brown.edu/courses/cs195-5/extras/shannon-1951.pdf :

	From this analysis it appears that, in ordinary literary
	English, the long range statistical effects (up to 100 letters)
	reduce the entropy to something of the order of one bit per
	letter, with a corresponding redundancy of roughly 75%. The
	redundancy may be still higher when structure extending over
	paragraphs, chapters, etc. is included.

> 
> 	- The major one is that the estimate should be for
> *characters*, not *words*.  So the number of bits of entropy in
> 		a 55-character phrase is about 137 (132, if you use
> 		2.4 bits/character), not 30.
> 
> 	- The minor one is that the English entropy estimate looks
> just at letters and spaces, not punctuation and capitalization.
> 		So it's probably low anyway.  However, this is a much
> 		smaller effect.

The interesting question is whether or not one can effectively
enumerate candidate phrases for a guessing program.  For that problem,
punctuation and capitalization are important.

		--Steve Bellovin, http://www.cs.columbia.edu/~smb

---------------------------------------------------------------------
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to majordomo at metzdowd.com