[Cryptography] new PRNG family

Tue Dec 9 15:59:36 EST 2014

Update on the breeze PRNG

tldr;
I made up my mind and decided to test long series of output of the smallest breeze variant (breeze128) using bloom filters for sequential repeatitions. 
Result: No output doublets.

I would be happy to get some feedback, if such proves can help to increase the acceptance of this approach to PRNG.

Background:

A logistic map in chaotic state is by definition not predictable - again by definition entropy is a result of the chaotic state itself. Nonetheless a degeneration of this entropy due to rounding to float double might happen, but - reasonably from definition - ANY possible output 0..1 in double float representation should happen. This is a fundamental difference to linear feedback shift registers and to linear congruential generators that preserve entropy from seed. In chaotic state the entropy derives from the chaotic state itself - this can be reproduced (deterministic) but not predicted.

Anyway, if the breeze generator (consisting of some logistic maps, interchange of log map state, variation of output generation by bitShift state) is degenerating, it should be possible to find a repetition of sequential output. If this happens, breeze would be unsuitable for cryptographic purpose because an attacker might predict future output (inner state) from the stream.

Experiment:

I made up my mind and decided to test long series of output of the smallest breeze variant (breeze128) for duplicates. 

breeze128 has 6 logistic maps that interchange their outcome at each calculation round. Initial seed-/keyspace of breeze128 is 128 bit (two uint64; each bit-splitted into three float double by calculating the inverse 1/uint64 from 21bit / 22bit / 21bit), output is 128 byte.

If a degeneration of the PRNG caused by information loss through rounding happens, period length might shorten and on long run a duplication of output sequence should be detectable.

Therefore i used a 2**36 bit Bloomfilter with seven indicator locations (hash indices) for each 128 Byte output (the Bloomfilter is 8.2GB RAM). I produced 10**11 (100 GB) output from breeze128 with random int64 seed and checked the resulting 781.250.000 outputs of length 128 Byte for duplicates. I repeated this 20 times (total: 2 TB)

Calculating the probability of bloomfilter false positive returns for the last GB (> 773.437.500  128Byte words):

E > (1 - (1- 1/2**36)**(7*781.250.000) )**7 
E ~ 1.53e-8

Results: 

run 1
at GB output    | No of positive (bloomfilter HAS(word)==true) in Bloomfilter
79              | 1
86              | 1
88              | 1
90              | 1
98              | 1

run 2 
71              | 1
88              | 1

run 3
78              | 1
91              | 1
98              | 1

run 4
89              | 1
96              | 1

run 5
94              | 1
98              | 2

run 6
84              | 1
91              | 1
93              | 1
94              | 1
96              | 1

run 7
58              | 1
94              | 1
96              | 1
100             | 1

run 8
59              | 1
73              | 1
94              | 1
99              | 1
100             | 2

run 9 
88              | 1

run 10 
96              | 1

run 11
-               | -

run 12
96              | 1
87              | 1
99              | 1
100             | 1

run 12
96              | 1

run 13
89              | 1
92              | 1

run 14
80              | 1
99              | 1

run 15
57              | 1
89              | 1
92              | 1

run 16
-               | -

run 17
95              | 1

run 18
-               | -

run 19
94              | 1
98              | 1

run 20 
-               | -

'Within run' no word was reported two times to be in the bloomfilter. 'Between runs' the words detected were different (no doublettes between runs).

Interpretation: 

There is no sequenze repetition (selective unique values were detected to be in the bloomfilter). 

Nonetheless occurrence is slightly higher than expected from the false positive ratio. False positive errors might be caused by:
1) bloom filters 'false positive rate = 1/1.53e-8'  might be underestimated 
    1.1. because the bloomfilter (as usual) is for the sake of speed populated by ONE hashfunction with addition instead of seven different hash functions, which is the basic assumption of the calculation
    1.2. sipHash is explicitly not collision safe
2) random itself (doublette probability in 128byte: 1/2**(1024))

--------------------

Second experiment

I did the same experiment on 256 Bytes (2 consecutive 128 Byte words) with the same bloom filter and again 20 seeds and 100 GB of output (total 2 TB). This makes half number of words (390.625.000) and half probability of false positives. On the other hand any occurring repeated sequence of state caused by degeneration of the PRNG, that occur within the range of 100 GB, should have shown up.  

The result is ZERO detection of doublettes.

Andreas