[Cryptography] converting one base to another

Thu Dec 4 15:56:29 EST 2014

True.

> You fill a pseudoentropy register by repeatedly
> multiplying by 5 and adding the generator output.
> 
> Then you can take binary output by taking the low bit of the
> pseudoentropy register and shifting right to divide by 2 (taking base-2
> digits from the integer).
> 
> Whenever the pseudoentropy register gets to be small enough to hold
> another base-5 digit, you multiply it by 5 and add the output of the
> generator (essentially you'd be 'replacing' the base-5 digit you most
> recently took, had you been taking digits in base-5 rather than base-2).

That's not a correct solution.  As stated, it's both biased
and many-to-one.  The bug is where it says "...small enough..."

Suppose the base-5 input has a string of N zeros in a row.
When you go to put these into the register, if the value 
happens to be zero initially, it will stay zero for a long 
time, and the output won't tell you how big N was.
Here are one way of fixing it:

Rather than thinking in terms of integers, think in terms of
a fixed-point fractions with the radix point in front, 
spanning the half-open interval [0, 1).
  -- The input  is a base-5 fraction of this kind.
  -- The output is a base-2 fraction of this kind.

Now consider the fraction 1/2.  
  On input, 1/2 is a repeating numeral in base 5, namely 0.222222*. 
  On output, 1/2 is obviously 0.100000*.

Any input that starts with the digits 0 or 1 is less than 1/2, 
so we know the first bit of the output is 0.  Any input that
starts with 3 or 4 is greater than 1/2, so we know the first
bit of the output is 1.  The nasty case consists of an input
that starts with N 2s in a row.  We have to sit on the fence
until we see the 1+Nth input.  When we hop off the fence,
we get to put out a very long string of output bits, so no
information was wasted.  However, there was some latency.

The latency problem recurs.  There are infinitely many
numbers that have a non-repeating base-2 representation
but a repeating base-5 representation.

The chance of a long latency is exponentially small, but
nonzero, and intuition tells me it cannot be completely
avoided if the bases are incommensurable, not without 
making compromises on the bias and/or the bijectivity.

===============

Note that a similar but slightly harder problem arises in 
connection with the construction of a HRNG.  Input entropy 
arrives in base who-knows-what.  We want the output to be 
unbiased.  We want the mapping to be very nearly bijective, 
because that is the only way to do the job without wasting 
very much entropy.

Turbid uses a hash for this.  The result is very very nearly
bijective (although the inverse mapping is computationally
intractable and completely uninteresting).   The result is 
very nearly unbiased, in the sense that if you compared the 
output stream to a truly unbiased stream, it would take you 
longer than the age of the universe to tell which is which.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQIVAwUBVIDKffO9SFghczXtAQIIyQ/+JXfPTRT+zO3Nef03THQothyRIP3M+Ga5
LmwOhr9vzs6P6kyQkwEntiq7b1eF7vj/6YRhzUqy4Bafnc+V653Q9IwL+0SKXirz
n9Ap+Oadf22d4NgjEAcvHiaT+09X+l3y4cJu84carXGfxu7N0NoQolqHjA053N/R
ZnaSlwse9vNzfsab+BMxo0noQeSsIMStTIgX2WzrJnzAoqN7n3bJ0hb5hw5LUE5a
uPxEtUtB2z/4M3hbTkITHrMUQnPDja8GonmPkzDZ5k+YYhm0qlI3THz6ReLgAgtn
7HUhDX67ZAhonvkxdIptwulzKvza0s8vVNmmNqXpMNyElQDcEXyvdXyj9OUt5R0c
op8OiY04eVAudUol85rHzzJjKDfX3KlJ29hL9c3uNWztfjOzOj3kP2z7nYpORYUb
edQJDFUfetIQPZSMlQPp9yhlFctd309JXh85/Ad00vXfiGQdUrPsU+DxRN6WvTnO
4kzKO9Ol4QhpwYACxiyvyf7DML9olZg/t6P5wKnMQONpjJZ7CnvzXWhJQzTvxhRa
y92zUX4fnmeKz1aKwfH+5SPmFyIQZBgHHW2uwuTIfs2fj5KYEEHocXPqvwimVTlN
dIPfr961SgX69OVa0nhVKdCIAUcoXamEk6gjkiItSBOigpMwRP4UO/l0qsc89bHV
mf1Xo8FrKyw=
=TJth
-----END PGP SIGNATURE-----