[Cryptography] Case-Sensitive letter frequencies

mok-kong shen mok-kong.shen at t-online.de
Fri Apr 1 11:14:25 EDT 2016


M. N. Jones and D. J. K. Mewhort have published a table of the counts
and the ranking of the letters, separately for upper-case and
lower-case, from a certain corpus. See
https://www.researchgate.net/publication/8090755

The following is their data of counts together with the frequency
distributions of the letters in the ensemble of both cases.

['e', 7741842, 0.1187] ['t', 5507692, 0.0845] ['a', 5263779, 0.0807]
['o', 4729266, 0.0725] ['n', 4535545, 0.0696] ['i', 4527332, 0.0694]
['s', 4186210, 0.0642] ['r', 4137949, 0.0635] ['h', 2955858, 0.0453]
['l', 2553152, 0.0392] ['d', 2369820, 0.0363] ['c', 1960412, 0.0301]
['u', 1613323, 0.0247] ['m', 1467376, 0.0225] ['f', 1296925, 0.0199]
['p', 1255579, 0.0193] ['g', 1206747, 0.0185] ['y', 1062040, 0.0163]
['w', 1015656, 0.0156] ['b', 866156, 0.0133]  ['v', 653370, 0.0100]
['k', 460788, 0.0071]  ['T', 325462, 0.0050]  ['S', 304971, 0.0047]
['A', 280937, 0.0043]  ['M', 259474, 0.0040]  ['C', 229363, 0.0035]
['I', 223312, 0.0034]  ['N', 205409, 0.0032]  ['B', 169474, 0.0026]
['R', 146448, 0.0022]  ['P', 144239, 0.0022]  ['E', 138443, 0.0021]
['D', 129632, 0.002]   ['H', 123632, 0.0019]  ['x', 123577, 0.0019]
['W', 107195, 0.0016]  ['L', 106984, 0.0016]  ['O', 105700, 0.0016]
['F', 100751, 0.0015]  ['Y', 94297, 0.0014]   ['G', 93212, 0.0014]
['J', 78706, 0.0012]   ['z', 66423, 0.0010]   ['j', 65856, 0.0010]
['U', 57488, 0.0009]   ['q', 54221, 0.0008]   ['K', 46580, 0.0007]
['V', 31053, 0.0005]   ['Q', 11659, 0.0002]   ['X', 7578, 0.0001]
['Z', 5610, 0.0001]

M. K. Shen


More information about the cryptography mailing list