[Cryptography] paragraph with expected frequencies
Ben Laurie
ben at links.org
Sat Dec 23 17:05:16 EST 2017
A sample from this thread will do - a snippet for the beginning yields:
858
506 e
442 t
362 a
350 o
312 n
307 i
285 r
241 s
215 h
175 l
174
159 c
144 _
134 y
131 p
130 d
122 u
112 g
105 m
96 w
94 f
85 .
73 b
72 -
36 2
35 :
34 ,
33 k
33 0
30 q
29 >
29 /
26 v
23 T
22 x
21 I
20 z
18 1
16 D
15 @
14 E
13 '
13 "
11 F
11 A
11 7
10 W
10 =
10 <
9 j
9 O
8 R
7 S
6 M
6 C
6 )
5 K
5 H
5 (
4 U
4 P
4 4
3 L
3 ?
3 5
3 3
2 G
2 B
2 6
1 ]
1 [
1 Y
1 V
1 J
1 ;
1 9
1 %
1 !
On 23 December 2017 at 01:26, Tom Mitchell <mitch at niftyegg.com> wrote:
> On Wed, Dec 20, 2017 at 1:02 AM, Robin Wood <robin at digi.ninja> wrote:
>
>> Hi
>> Something a little less technical than a normal question...
>>
>> I'm working on a bit of crypto with my young daughter and we are about to
>> look at frequency analysis. Are there any short UK English paragraphs where
>> the frequency of letters is about what you would expect based on frequency
>> charts? i.e. E then T, A and O.
>>
>> Bonus if the digraphs are also roughly in order.
>>
>> I want to count the letters by hand so don't want anything too long and
>> it has to be PG content.
>>
>
> If you believe WP this is harder to do than it sounds.
>
> I would go to Project Gutenberg and grab a pile of age appropriate books,
> poems and stories.
> Pull them in to a page sampler with automated counter and test.
>
> This has promise... https://www.gutenberg.org/ebooks/20532
> as does.. https://www.gutenberg.org/files/40063/40063-h/40063-h.htm
>
> An assertion that Morris code was organized to shorten transmissions
> is worthy of a test.
>
> https://en.wikipedia.org/wiki/Letter_frequency
> "Letter frequencies, like word frequencies
> <https://en.wikipedia.org/wiki/Word_frequencies>, tend to vary, both by
> writer and by subject. One cannot write an essay about x-rays without using
> frequent Xs, and the essay will have an idiosyncratic letter frequency if
> the essay is about the frequent use of x-rays to treat zebras in Qatar.
> Different authors have habits which can be reflected in their use of
> letters. Hemingway <https://en.wikipedia.org/wiki/Ernest_Hemingway>'s
> writing style, for example, is visibly different from Faulkner
> <https://en.wikipedia.org/wiki/William_Faulkner>'s. Letter, bigram
> <https://en.wikipedia.org/wiki/Bigram>, trigram
> <https://en.wikipedia.org/wiki/Trigram>, word frequencies, word length,
> and sentence length can be calculated for specific authors, and used to
> prove or disprove authorship of texts, even for authors whose styles are
> not so divergent.
>
> Accurate average letter frequencies can only be gleaned by analyzing a
> large amount of representative text. With the availability of modern
> computing and collections of large text corpora
> <https://en.wikipedia.org/wiki/Corpus_linguistics>, such calculations are
> easily made. Examples can be drawn from a variety of sources (press
> reporting, religious texts, scientific texts and general fiction) and there
> are differences especially for general fiction with the position of 'h' and
> 'i', with H becoming more common."
>
> <http://www.metzdowd.com/mailman/listinfo/cryptography>Long poems like
> Longfellow's Evangeline might be sampled to see if a five or ten line
> sample from ten places in
> the poem matched.
>
>
>
>
>
>
> --
> T o m M i t c h e l l
>
> _______________________________________________
> The cryptography mailing list
> cryptography at metzdowd.com
> http://www.metzdowd.com/mailman/listinfo/cryptography
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.metzdowd.com/pipermail/cryptography/attachments/20171223/32257c23/attachment.html>
More information about the cryptography
mailing list