# [Cryptography] converting one base to another

John Denker jsd at av8n.com
Thu Dec 4 15:56:29 EST 2014

```-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 12/03/2014 01:23 PM, Ray Dillinger wrote:
> Most conversion schemes for base-n pseudoentropy to binary
> pseudoentropy have one of two problems: either there can be multiple
> input streams that produce identical output, or there is bias in the
> output stream because some inputs count for more or less
> pseudoentropy than they're worth.

True.

> You fill a pseudoentropy register by repeatedly
> multiplying by 5 and adding the generator output.
>
> Then you can take binary output by taking the low bit of the
> pseudoentropy register and shifting right to divide by 2 (taking base-2
> digits from the integer).
>
> Whenever the pseudoentropy register gets to be small enough to hold
> another base-5 digit, you multiply it by 5 and add the output of the
> generator (essentially you'd be 'replacing' the base-5 digit you most
> recently took, had you been taking digits in base-5 rather than base-2).

That's not a correct solution.  As stated, it's both biased
and many-to-one.  The bug is where it says "...small enough..."

Suppose the base-5 input has a string of N zeros in a row.
When you go to put these into the register, if the value
happens to be zero initially, it will stay zero for a long
time, and the output won't tell you how big N was.
Here are one way of fixing it:

Rather than thinking in terms of integers, think in terms of
a fixed-point fractions with the radix point in front,
spanning the half-open interval [0, 1).
-- The input  is a base-5 fraction of this kind.
-- The output is a base-2 fraction of this kind.

Now consider the fraction 1/2.
On input, 1/2 is a repeating numeral in base 5, namely 0.222222*.
On output, 1/2 is obviously 0.100000*.

Any input that starts with the digits 0 or 1 is less than 1/2,
so we know the first bit of the output is 0.  Any input that
starts with 3 or 4 is greater than 1/2, so we know the first
bit of the output is 1.  The nasty case consists of an input
that starts with N 2s in a row.  We have to sit on the fence
until we see the 1+Nth input.  When we hop off the fence,
we get to put out a very long string of output bits, so no
information was wasted.  However, there was some latency.

The latency problem recurs.  There are infinitely many
numbers that have a non-repeating base-2 representation
but a repeating base-5 representation.

The chance of a long latency is exponentially small, but
nonzero, and intuition tells me it cannot be completely
avoided if the bases are incommensurable, not without
making compromises on the bias and/or the bijectivity.

===============

Note that a similar but slightly harder problem arises in
connection with the construction of a HRNG.  Input entropy
arrives in base who-knows-what.  We want the output to be
unbiased.  We want the mapping to be very nearly bijective,
because that is the only way to do the job without wasting
very much entropy.

Turbid uses a hash for this.  The result is very very nearly
bijective (although the inverse mapping is computationally
intractable and completely uninteresting).   The result is
very nearly unbiased, in the sense that if you compared the
output stream to a truly unbiased stream, it would take you
longer than the age of the universe to tell which is which.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQIVAwUBVIDKffO9SFghczXtAQIIyQ/+JXfPTRT+zO3Nef03THQothyRIP3M+Ga5
LmwOhr9vzs6P6kyQkwEntiq7b1eF7vj/6YRhzUqy4Bafnc+V653Q9IwL+0SKXirz
ZnaSlwse9vNzfsab+BMxo0noQeSsIMStTIgX2WzrJnzAoqN7n3bJ0hb5hw5LUE5a
uPxEtUtB2z/4M3hbTkITHrMUQnPDja8GonmPkzDZ5k+YYhm0qlI3THz6ReLgAgtn
7HUhDX67ZAhonvkxdIptwulzKvza0s8vVNmmNqXpMNyElQDcEXyvdXyj9OUt5R0c
op8OiY04eVAudUol85rHzzJjKDfX3KlJ29hL9c3uNWztfjOzOj3kP2z7nYpORYUb