[Cryptography] paragraph with expected frequencies

Ben Laurie ben at links.org
Sat Dec 23 17:05:16 EST 2017


A sample from this thread will do - a snippet for the beginning yields:

 858
 506 e
 442 t
 362 a
 350 o
 312 n
 307 i
 285 r
 241 s
 215 h
 175 l
 174
 159 c
 144 _
 134 y
 131 p
 130 d
 122 u
 112 g
 105 m
  96 w
  94 f
  85 .
  73 b
  72 -
  36 2
  35 :
  34 ,
  33 k
  33 0
  30 q
  29 >
  29 /
  26 v
  23 T
  22 x
  21 I
  20 z
  18 1
  16 D
  15 @
  14 E
  13 '
  13 "
  11 F
  11 A
  11 7
  10 W
  10 =
  10 <
   9 j
   9 O
   8 R
   7 S
   6 M
   6 C
   6 )
   5 K
   5 H
   5 (
   4 U
   4 P
   4 4
   3 L
   3 ?
   3 5
   3 3
   2 G
   2 B
   2 6
   1 ]
   1 [
   1 Y
   1 V
   1 J
   1 ;
   1 9
   1 %
   1 !


On 23 December 2017 at 01:26, Tom Mitchell <mitch at niftyegg.com> wrote:

> On Wed, Dec 20, 2017 at 1:02 AM, Robin Wood <robin at digi.ninja> wrote:
>
>> Hi
>> Something a little less technical than a normal question...
>>
>> I'm working on a bit of crypto with my young daughter and we are about to
>> look at frequency analysis. Are there any short UK English paragraphs where
>> the frequency of letters is about what you would expect based on frequency
>> charts? i.e. E then T, A and O.
>>
>> Bonus if the digraphs are also roughly in order.
>>
>> I want to count the letters by hand so don't want anything too long and
>> it has to be PG content.
>>
>
> If you believe WP this is harder to do than it sounds.
>
> I would go to Project Gutenberg and grab a pile of age appropriate books,
> poems and stories.
> Pull them in to a page sampler with automated counter and test.
>
> This has promise... https://www.gutenberg.org/ebooks/20532
> as does.. https://www.gutenberg.org/files/40063/40063-h/40063-h.htm
>
> An assertion that Morris code was organized to shorten transmissions
> is worthy of a test.
>
> https://en.wikipedia.org/wiki/Letter_frequency
> "Letter frequencies, like word frequencies
> <https://en.wikipedia.org/wiki/Word_frequencies>, tend to vary, both by
> writer and by subject. One cannot write an essay about x-rays without using
> frequent Xs, and the essay will have an idiosyncratic letter frequency if
> the essay is about the frequent use of x-rays to treat zebras in Qatar.
> Different authors have habits which can be reflected in their use of
> letters. Hemingway <https://en.wikipedia.org/wiki/Ernest_Hemingway>'s
> writing style, for example, is visibly different from Faulkner
> <https://en.wikipedia.org/wiki/William_Faulkner>'s. Letter, bigram
> <https://en.wikipedia.org/wiki/Bigram>, trigram
> <https://en.wikipedia.org/wiki/Trigram>, word frequencies, word length,
> and sentence length can be calculated for specific authors, and used to
> prove or disprove authorship of texts, even for authors whose styles are
> not so divergent.
>
> Accurate average letter frequencies can only be gleaned by analyzing a
> large amount of representative text. With the availability of modern
> computing and collections of large text corpora
> <https://en.wikipedia.org/wiki/Corpus_linguistics>, such calculations are
> easily made. Examples can be drawn from a variety of sources (press
> reporting, religious texts, scientific texts and general fiction) and there
> are differences especially for general fiction with the position of 'h' and
> 'i', with H becoming more common."
>
> <http://www.metzdowd.com/mailman/listinfo/cryptography>Long poems like
> Longfellow's Evangeline might be sampled to see if a five or ten line
> sample from ten places in
> the poem matched.
>
>
>
>
>
>
> --
>   T o m    M i t c h e l l
>
> _______________________________________________
> The cryptography mailing list
> cryptography at metzdowd.com
> http://www.metzdowd.com/mailman/listinfo/cryptography
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.metzdowd.com/pipermail/cryptography/attachments/20171223/32257c23/attachment.html>


More information about the cryptography mailing list