[Cryptography] Calculating Algebraic Complexity or Algebraic Equation of the S-box

Sun May 31 15:06:06 EDT 2015

On 05/30/2015 09:52 AM, Manish Malik wrote:
> Hi ,
> I am an undergraduate student pursuing Computer engineering and I am doing
> some research in S-box. While evaluating the properties of an AES or some
> general S-box(nxn), Algebraic Complexity is considered to be one of the
> major factor for a good S-box design. But I am facing difficulty in
> calculating it. I have spent a quite a lot time in searching on
> understanding how can I evaluate it but I don't found any useful algorithm
> or codes for evaluating the complexity of an S-box. I have even asked this
> specific question with some great details at
> http://crypto.stackexchange.com/questions/22944/evaluating-algebraic-complexity-of-a-s-box
> but haven't received any solution yet .
> 
> So, if anyone can help me with this then please reply.

It is kind of arcane.  I'd be interested in any commentary
more knowledgeable than mine on this as well.  This list
does not talk as much as I'd like about actual cipher
design, since the consensus is that the vast majority,
even of professionals, should not attempt it.  So what I
know about cipher design I've learned mainly by studying
(and trying to break) existing ciphers.

Read this:
https://www.schneier.com/crypto-gram/archive/1998/1015.html#cipherdesign

to get a good idea why cipher design is generally considered
to be something that most people should not attempt.

So the vast majority of us don't do it, and it doesn't
get discussed much. What I'm about to say is derived from
general knowledge and obsessive examination of existing
ciphers, not from any very deep theoretical knowledge of
their design methodology.  If anyone knows more, *please*
chime in.

That said, my impression, from studying existing ciphers,
is that good S-Boxes implement permutations which have
maximal period; ie, that 4-bit S-boxes from good ciphers
always seem to create a cycle 16 states long if iterated
on their own output, and that each bit in such an iteration
has a *different* 16-state cycle which is not an offset or
inversion of any other bit.  And that in order to describe
the operation that an S-box does, you have to use operations
from (at least) two different "group" relationships to
describe the relationship between the set of inputs to
the set of outputs. For example, combining addition-with-
carry, bitmasking via XOR, and bit shifting.

Avoiding the accidental creation of S-boxes whose behavior
can be explained simply in operations of a single group
relationship is a crucial feature of all good ciphers.
Especially when people are free to define new group
relationships.  Occasionally a new group relationship
defined by someone will be amenable to a new kind of
analysis.

And this property remains true regardless of any S-box
modification due to the key used. This is why the influence
of the key is generally limited to bit-masking.  Any round
or S-box construction in which a possible keyschedule
breaks that relationship "doesn't count" in the number
of cipher rounds required by the properties I'll describe
now.

S-boxes are never iterated on their own output.  The P-boxes
partition the output of all S-boxes into subsets (ideally
one bit subsets) which are fed to S-boxes (ideally to all
S-boxes) in the next round of the cipher.

Ciphers vary as to exactly what the P-boxes do with the
S-box output; from examination of good ciphers, there seems
to be a pattern that wherever S-boxes are repeated from one
round to the next, the bit assignment usually (but not always)
changes: for example, bit zero output of an S-box is rarely
directed to bit zero of the input of an identical S-box in
the next round.

All good ciphers of the Feistel construction have enough
rounds that what I call the "cascade depth" (that is,
sufficient rounds to assure that every bit of output
depends on every bit of input *and* every bit of key) is
repeated at least four times.  IE, if it takes four
rounds to achieve complete cascade, the cipher will have
at least 16 rounds.  This much I do understand the reasoning
for.  It has been proven that achieving cascade depth at
least twice suffices to transform *any* input into *any*
output.  And achieving *that* level of transformation twice
(meaning cascade depth is achieved a total of four times)
is required for a defense against meet-in-the-middle attacks.

Non-Feistel constructions (ie, constructions with reversible
S-boxes) can at least in theory have only sufficient rounds
to achieve complete cascade twice.  But I've never seen it
done, because reversible S-boxes always form a group
relationship.  Security in non-Feistel ciphers depends on
the P-boxes to ensure that the group relationship formed by
composition is defined only for inputs and outputs the size
of the cipher block *plus* the key.  And also for the
reasons I'll explain below.

Unless after the minimum number of rounds theoretically
required every bit of output depends *EQUALLY* on every
bit of input, the cipher must have more rounds.

Likewise unless after that number of rounds every bit of
output depends *EQUALLY* on every bit of the key, the cipher
must have more rounds.

Now here's the really arcane part, which I know is a
requirement, and observe in good ciphers, but have no idea
how to do:  in a Feistel cipher, the number of rounds must
be selected such that the influence of any particular
*algebraic relationship* between bits of input on any
particular *algebraic relationship* between bits of output
differs by less than one part in 2^n, where n is the block
size of the cipher.

Likewise in a Feistel cipher the number of rounds must be
selected so that the influence of any particular algebraic
relationship between bits of key on any particular algebraic
relationship between output bits is less than one part in
2^m where m is the key length of the cipher.

And in a non-Feistel cipher the number of rounds must be
selected so that the influence of any particular algebraic
relationship between any bits of key AND input on any
algebraic relationship between bits of output, is less
than one part in 2^q where q = length of key PLUS length
of block.

What that means is that under various groups for which
particular analysis methods are known the relationship
between input, key, and output is unpatterned.

But those last two properties are where even pros sometimes
fail.  It isn't really possible to check every possible
algebraic relationship over a large block and key size;
as far as I can tell it's as bad as trying to check every
key by brute-force. Cipher "breaks" are accomplished by
finding algebraic relationships on which these properties
fail.  Sometimes this can be done by studying the
construction of the cipher. I can sometimes do this to
ciphers designed by amateurs, but to date I've never
cracked a cipher designed by a pro.

Ensuring that finding such algebraic relationships of
particular kinds can't be done by particular methods of
analysis is a matter of advanced group theory: The best
ciphers achieve defense against all known particular
methods of analysis against all considered particular
types of algebraic relationships, at more or less the
same round.

But people are always coming up with algebraic relationships
of types or on groups that the cipher designer didn't
consider, and occasionally developing new methods of analysis.
Whether a cipher stands against these is mostly a question
of whether the designer got lucky or whether the cipher
designer included enough rounds "redundant or unnecessary"
by known relationships and attacks that the new thing even
though more effective on the cipher than any of the old things
is not more effective than brute force.

But these "redundant or unnecessary" rounds included for
insurance against new attacks, are balanced against a
requirement that ciphers must be fast to execute, and how
many "extra" rounds are needed is always a judgement call
by the cipher designer.  Only with a mathematical proof
that any analysis could be used to do [Hard Problem] in
[faster than Hard Problem is known to take] could a
cipher designer include *NO* extra rounds. For example,
solving RSA is at least as hard as factoring, because any
solution to RSA, no matter how achieved, could be used
to factor the modulus immediately.

Hope that helps:  And also hope anybody who knows more,
or can correct any possible misinformation I've spouted,
will chime in.  Anybody?

				Bear

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://www.metzdowd.com/pipermail/cryptography/attachments/20150531/b4db548c/attachment.sig>