[Cryptography] Mapping numbers to permutations
Viktor Dukhovni
cryptography at dukhovni.org
Thu Dec 4 02:07:48 EST 2014
On Thu, Dec 04, 2014 at 07:28:25AM +0100, Andreas Briese wrote:
> Don't get it.
> If the output of the generator is random then its binary notation is random too.
> Why don't you use this directly, if you need a binary stream?
A uniformly random stream of integers selected from [0, n-1] does
not directly yield an unbiased stream of bits.
Sampling the first $k$ elements of the stream results in any of
$n^k$ possible states with uniform probability. Suppose such a
sample is used to generate the first $m$ bits of a binary output
stream. This is accomplished via some function
f: n^k -> 2^m
The only way that $f$ can result in a uniform distribution on $2^m$
is if $2^m$ divides $n^k$. So in particular with $n = 5$ the
claimed uniformity is provably impossible. What is possible is to
discard some subset of $n^k$ so that $2^m$ divides the number of
allowed states. Whenever a forbidden state is reached, you throw
away your $k$ inputs and try another $k$.
This is reasonably efficient when $n^k$ is just larger than a
power of $2$. For $n = 5$, and given the continued fraction
for $log_2(5) = [2; 3, 9, 2, 2, 4, 6, 2, 1, 1, 3, 1, 18, ...]$
we get the rational lower bound approximations:
2/1
65/28
339/146
...
So for example 2^65 is just less than 5^28:
$ echo "10k 2 65 ^ 5 28 ^ / p" | dc
.9903520314
Therefore, after generating 28 base 5 digits, the resulting number
fits into 66 bits, but 99% of the time only 65 bits are required
and the top bit is zero. If the top bit is non-zero discard the
state, and generate another 28 base 5 digits. Otherwise, output
the bottom 65 bits. This is uniformly random, and reasonably
efficient (discards are rare). If we take the next approximation:
$ echo "10k 2 339 ^ 5 146 ^ / p" | dc
.9989595361
The discard rate drops to ~0.1%, but the output arrives in clumps
of 339 bits at a time (of course this bit stream can be buffered).
The simplest thing of course is to use [0, 3] and discard all the
4's base 5. The discard rate is 20%, and a run of "4" will
occasionally stall the generator for a few cycles until the bad
luck (Chinese numerology anyone) runs out.
--
Viktor.
More information about the cryptography
mailing list