[Cryptography] How do I tell if a byte sequence repeats?

Sampo Syreeni decoy at iki.fi
Tue Aug 26 20:22:11 EDT 2025


On 2025-08-25, Pierre Abbat wrote:

> The state is the last several ciphertext bytes, together with the 
> accumulator, which, if the plaintext is all zeros except for one byte 
> at the beginning, is that byte, a constant.

So remember those when you sample. All of them. That is the state of 
your algorithm, and what needs to be compared in order to apply any 
cycle finding algorithm which is dependentent on equality comparisons.

But you know, you will do something with those preceding bytes of 
ciphertext, in order to arrive at the outcome that is the next byte 
output. Typically in CRNG's that will lead to some update of a finite 
state machine which is *not* the same as the whole prefix thus far 
produced by your generator. You can *always* include the preceding 
prefix in your state and compare over it, but here, you typically don't 
have to. You only need enough information to determine what you're 
outputting, over a step of your overall algorithm. All of the preceding 
state, the prefix, is usually propagated via an intermediate, 
well-organized representation of your algorithm's state, which you can 
compare over.

Not that you'd have to optimize like this. If your underlying algorithm 
already looks at its past in an unbounded fashion, it will probably be a 
bad one. What's called for in crypto work is low, linear cost in bytes 
consumed and produced, using bounded space in the process. That implies 
you can do state recognition the way I said. If you can't, your crypto 
primitive is broken to begin with, and needs to be changed.

> I keep the last 8 bytes in a UInt64 as the state, and use a Fibonacci 
> version of Brent's method.

You can use the last eight bytes as a sieve, a proxy, for the state, and 
you can use whatever exponential search strategy you want, binomial, 
Fibonacci, whatever. But the thing you need to remember is that whatever 
your algorithm put out, is not the same as its state. Whatever it put 
out, does not determine what it will put out in the future. At the same 
time, if you gather up all of the stuff your algorithm depends on to do 
its next step, that will always determine its future in full. That is 
its "state". The sum total of its internal variables, on which it takes 
decisions. Compare on *that*.

> With a four-byte key, the periods are usually the same and close to 
> 65536 in order of magnitude, which is what I'd expect if the state 
> transition is a random function, not a permutation: 26413, 45336, 
> 24703, 70546.

What I'm talking about is once removed from the sequence level of yours. 
If you think about different sequences generated in a probabilistic 
fashion, yes, you can have these things where different "key lengths" 
lead to different cycle lenghts. But in the deterministic, algorithmic 
framework I'm coming from, you either have a cycle, or you don't have 
one, there can only be one, and detecting one is generally equivalent to 
the halting problem, so any attempt at it will not be, cannot be, a 
recursive decision problem. It will have to be somewhere in the 
recursively enumerable class, so not guaranteed to terminate.

Getting arbitrarily strong stochastic bounds, especially given such a 
strong stochastic model as we have with cryptosystems and their 
associated random number generators, then random oracle model, and 
whatnot, is very different. But then I don't think you're really talking 
about this, since you only asked about your system falling into a cycle. 
That is not a statistical thing, but a deterministic, algorithmic one. 
It can be best quantified and analysed at the level of algorithmic 
state and its equivalence, instead of looking at the many distributions 
in input and output sequences your algorithms might produce.

> One run, though, resulted in one of the Daphnes falling into a cycle 
> of length 24905 and the other in a cycle of length 45718, which are 
> relatively prime. I plotted the heat map, and that's the one that 
> looks like tartan.

If we're going this way, divulge the code. All of it, including your 
test framework. It's very difficult to follow what you're going at, 
otherwise.
-- 
Sampo Syreeni, aka decoy - decoy at iki.fi, http://decoy.iki.fi/front
+358-40-3648785, 025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2


More information about the cryptography mailing list