[Cryptography] converting one base to another

John Denker jsd at av8n.com
Thu Dec 4 15:56:29 EST 2014

Hash: SHA1

On 12/03/2014 01:23 PM, Ray Dillinger wrote:
> Most conversion schemes for base-n pseudoentropy to binary
> pseudoentropy have one of two problems: either there can be multiple
> input streams that produce identical output, or there is bias in the
> output stream because some inputs count for more or less
> pseudoentropy than they're worth.


> You fill a pseudoentropy register by repeatedly
> multiplying by 5 and adding the generator output.
> Then you can take binary output by taking the low bit of the
> pseudoentropy register and shifting right to divide by 2 (taking base-2
> digits from the integer).
> Whenever the pseudoentropy register gets to be small enough to hold
> another base-5 digit, you multiply it by 5 and add the output of the
> generator (essentially you'd be 'replacing' the base-5 digit you most
> recently took, had you been taking digits in base-5 rather than base-2).

That's not a correct solution.  As stated, it's both biased
and many-to-one.  The bug is where it says "...small enough..."

Suppose the base-5 input has a string of N zeros in a row.
When you go to put these into the register, if the value 
happens to be zero initially, it will stay zero for a long 
time, and the output won't tell you how big N was.
Here are one way of fixing it:

Rather than thinking in terms of integers, think in terms of
a fixed-point fractions with the radix point in front, 
spanning the half-open interval [0, 1).
  -- The input  is a base-5 fraction of this kind.
  -- The output is a base-2 fraction of this kind.

Now consider the fraction 1/2.  
  On input, 1/2 is a repeating numeral in base 5, namely 0.222222*. 
  On output, 1/2 is obviously 0.100000*.

Any input that starts with the digits 0 or 1 is less than 1/2, 
so we know the first bit of the output is 0.  Any input that
starts with 3 or 4 is greater than 1/2, so we know the first
bit of the output is 1.  The nasty case consists of an input
that starts with N 2s in a row.  We have to sit on the fence
until we see the 1+Nth input.  When we hop off the fence,
we get to put out a very long string of output bits, so no
information was wasted.  However, there was some latency.

The latency problem recurs.  There are infinitely many
numbers that have a non-repeating base-2 representation
but a repeating base-5 representation.

The chance of a long latency is exponentially small, but
nonzero, and intuition tells me it cannot be completely
avoided if the bases are incommensurable, not without 
making compromises on the bias and/or the bijectivity.


Note that a similar but slightly harder problem arises in 
connection with the construction of a HRNG.  Input entropy 
arrives in base who-knows-what.  We want the output to be 
unbiased.  We want the mapping to be very nearly bijective, 
because that is the only way to do the job without wasting 
very much entropy.

Turbid uses a hash for this.  The result is very very nearly
bijective (although the inverse mapping is computationally
intractable and completely uninteresting).   The result is 
very nearly unbiased, in the sense that if you compared the 
output stream to a truly unbiased stream, it would take you 
longer than the age of the universe to tell which is which.
Version: GnuPG v1


More information about the cryptography mailing list