# compressing randomly-generated numbers

Alexander Klimov alserkli at inbox.ru
Tue Aug 29 06:17:30 EDT 2006

```On Mon, 28 Aug 2006, Travis H. wrote:
> On 8/23/06, Alexander Klimov <alserkli at inbox.ru> wrote:
> > A random bit stream should have two properties: no bias and no
> > dependency between bits. If one has biased but independent bits he
> > can use the von Neumann algorithm to remove the bias, but if there is
> > a dependency no algorithm will be able to `strip' it.
>
> Interesting claim;

Well, it not really a claim since there was no definition, here it is:
A ``dependency stripping'' algorithm is a deterministic algorithm that
gets a stream of unbiased (but not necessary independent bits) and
produces a stream of several independent bits.

> what if I XOR it with a truly random stream,

It is no a deterministic algorithm.

> or a computationally-pseudorandom stream,

This requires a random key.

Recall that the whole point was to extract random bits -- if we
already have useful random bits we can simply output them and discard
the input stream.

> or if I filter out every other bit?

Consider ``x a x b x c x ...'', sampling every other bit throws away
most of entropy.

In general, there is no ``dependency stripping'' algorithm because an
input x,x,x,..., where all x are the same (either all 0 or all 1), is
possible. There is simply no enough entropy in the input to extract at
least two bits.

Interestingly, an additional requirement that the input stream has at
least two bits of entropy does not change the situation: Consider the
first two bits of the output. There are only four possibilities (00,
01, 10, and 11), so if the input is long enough then it is possible to
find four different inputs that give the same bits, e.g.,

A(0001) = 00...
A(0010) = 00...
A(0100) = 00...
A(1000) = 00...

An input that contains 0001, 0010, 0100, and 1000 with the same
probability has two bits of entropy, but it is always transformed by
the algorithm into 00... and thus the first two bits are not
independent. (Strictly speaking, if we recall a requirement that the
input bits are unbiased we should also include into the set of inputs
1110, 1101, ..., but this does not change the result.)

> Seems like these would, if not eliminate the dependency, would
> weaken its effect. The last is equivalent to sampling at a slower
> rate.

See above about ``sampling at a slower rate.''

--
Regards,