[Cryptography] Mapping numbers to permutations

Viktor Dukhovni cryptography at dukhovni.org
Thu Dec 4 02:07:48 EST 2014

On Thu, Dec 04, 2014 at 07:28:25AM +0100, Andreas Briese wrote:

> Don't get it.  
> If the output of the generator is random then its binary notation is random too. 
> Why don't you use this directly, if you need a binary stream?

A uniformly random stream of integers selected from [0, n-1] does
not directly yield an unbiased stream of bits.

Sampling the first $k$ elements of the stream results in any of
$n^k$ possible states with uniform probability.  Suppose such a
sample is used to generate the first $m$ bits of a binary output
stream.  This is accomplished via some function

    f: n^k -> 2^m

The only way that $f$ can result in a uniform distribution on $2^m$
is if $2^m$ divides $n^k$.  So in particular with $n = 5$ the
claimed uniformity is provably impossible.  What is possible is to
discard some subset of $n^k$ so that $2^m$ divides the number of
allowed states.  Whenever a forbidden state is reached, you throw
away your $k$ inputs and try another $k$.

This is reasonably efficient when $n^k$ is just larger than a
power of $2$.  For $n = 5$, and given the continued fraction
for $log_2(5) = [2; 3, 9, 2, 2, 4, 6, 2, 1, 1, 3, 1, 18, ...]$

we get the rational lower bound approximations:


So for example 2^65 is just less than 5^28:

    $ echo "10k 2 65 ^ 5 28 ^ / p" | dc

Therefore, after generating 28 base 5 digits, the resulting number
fits into 66 bits, but 99% of the time only 65 bits are required
and the top bit is zero.  If the top bit is non-zero discard the
state, and generate another 28 base 5 digits.  Otherwise, output
the bottom 65 bits.  This is uniformly random, and reasonably
efficient (discards are rare).  If we take the next approximation:

    $ echo "10k 2 339 ^ 5 146 ^ / p" | dc

The discard rate drops to ~0.1%, but the output arrives in clumps
of 339 bits at a time (of course this bit stream can be buffered).

The simplest thing of course is to use [0, 3] and discard all the
4's base 5.  The discard rate is 20%, and a run of "4" will
occasionally stall the generator for a few cycles until the bad
luck (Chinese numerology anyone) runs out.


More information about the cryptography mailing list