[Cryptography] An interesting little pseudorandom number generator

Sat Jul 31 20:49:25 EDT 2021

On 7/30/21 11:37 AM, Jacob Christian Munch-Andersen wrote:
> It is an intriguing design, but it produces an even output 257 out of
> 512 times.

Took me a while to figure out why it was doing that.  It wasn't the
"obvious" reason I thought of first.  And it took another while to
figure out what to do about it.  Thanks for pointing it out.

> The variable value lookup looks kind of side-channely, I'm not sure if
> it is exploitable, but the general consensus in modern cryptography is
> that we avoid such things completely, because it is devilishly hard to
> assert that there is no way to exploit that. The variable lookup is on
> the other hand the only thing that makes this not trivially linearly
> solvable, but I'm not sure if that is enough.

I'm aware that state leakage sunk arc4.  And this is using part of its
state directly as one of the addends to the output.  I don't think
there's any way to use a single addend without access to either the sum,
or the other addend, to do anything evil.  I'm not sure what
construction would entirely avoid it, as the output of any PRNG contains
information about the state of the PRNG.

Considering it carefully.... The observer is looking at the sum of two
values from the generator state.
The first is novel each time, but the second is (pseudo)randomly
selected from the state.  If the second happens to be the value that was
novel on any recent previous time, it shouldn't be possible to tell
because the currently novel value masks it.  So there shouldn't be any
cases where output N matches output N+C for small C - it's not possible
for the same two values to be picked twice. 

It is possible for the novel value to be picked as the second term
making the output double a single value from the state.  But there is
nothing special about a value that happens to be even.

State compromise extensions are limited because Fibonacci generators
don't run backward.  When you are looking at an output you can't tell
what two values added together to make that value, and one of them
disappeared when the generator cycled.  Because one of the addends that
formed the Novel value disappears when the generator cycles, you have no
way to reconstruct the previous state given the current state.

Given the current state, you can tell what the most recent outputs of
the generators have been; The first three, plus the recent output of
PRNG, will allow you to identify which elements have been added together
to make the output.  But this state compromise extension is limited to
16 outputs; beyond that, information disappears from the state (unless
you can undo addition given only the sum) faster than outputs make it
available.

So, I'm inclined to say 'state compromise extension' is likely to be
very short.  I'm aware that I'm not the smartest monkey on the planet,
and I could be wrong.  But I'm inclined to say it's not likely.

Output prediction is also something I don't think is likely; A classic
LFG of course has its formula where the output is absolutely correlated
with two specific previous outputs.  But while the output of each
generator is subject to that three-point correlation, the PRNG output,
by itself, doesn't give clues about what the outputs of the individual
generators are.

> Using only addition has two major issues, you stick everything in the
> same linear space, and the higher order bits never get to influence
> the lower order bits.

Hmmm.  I like the longer period of the additive generators.  OTOH, the
difference is a factor of 2^word length.  If the longest generator
retains that factor it doesn't matter if the others do as well.  So if I
leave the longest one additive, making the others multiplicative should
have no effect on the period. 

> While I'm no fan of creeping it to the absolute minimum, 2 kB of state
> for an rng might be a bit excessive.

If I recall correctly, 4kB is one cache line on modern hardware; if it
fits in cache at all, then making it smaller can't make it faster.  That
said, it could go down to 1kB with int32s, or even to 1/2kB with int16s.

> You seem to have defined the size of integers to depend on the
> platform, you can make multiple different versions optimized for
> different platforms, but you should generally let the user choose
> which one to use so that they can get the same result across different
> platforms.

Yes. That is absolutely true, and needs corrected.  A PRNG has to give
repeatable sequences in order to be useful.  When we don't want
repeatable sequences we want to be using urand() or getrandom().

Thanks for looking at it!

                Bear

Updated code:

uint64_t A[256];
uint32_t N;

uint64_t pseudorandom(){
    N--;
    A[N%256] = A[(N+5)%256] * A[(N+17)%256];           // <- shorter period
    A[(N+17)%256] = A[(N+41)%256] * A[(N+72)%256];     // <- generators
are now
    A[(N+72)%256] = A[(N+120)%256] * A[(N+161)%256];   // <- multiplicative
    A[(N+161)%256] = A[(N+163)%256] + A[(N+254)%256];  // <- corrected
formula
    A[(N+255)%256] = ~A[N%256]; // <- fixes autocorrelation of first
generator.
    return(A[N%256]+A[(A[(N+17)%256]+A[(N+72)%256])%256]);
}

void init(char *initstr){
    int count; int idx; int len = strlen(initstr); int shift;
    for (count = 0; count < len || count < 256; count++)
        A[(count + initstr[count % len])%256] += count +
(initstr[count%len] << (1 + count % 16));
    // guarantee invariant: at least one odd number in each generator state.
    N=0; A[5]|=1; A[50]|=1; A[150]|=1; A[250]|=1;
    for (count = 0; count < 1024; count++){
        idx = pseudorandom()%256;
        A[idx] += pseudorandom()<<16;
    }
}

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.metzdowd.com/pipermail/cryptography/attachments/20210801/08910f21/attachment.htm>