Rijndael in Assembler for x86?

Mikael Johansson mikael.johansson at wineasy.se
Mon Sep 17 17:43:55 EDT 2001


Peter Trei wrote:
> > From:
> > iang at abraham.cs.berkeley.edu[SMTP:iang at abraham.cs.berkeley.edu]
> >
> > In article <87d74urezs.fsf at snark.piermont.com>,
> > Perry E. Metzger <perry at piermont.com> wrote:
> > >
> > >Helger Lipmaa <helger at tcs.hut.fi> writes:
> >
> > >> Why just not to use a C code?
> > >
> > >Because it is typically slower by many times than hand tuned assembler.

> I'll chime in with Perry here - The newer processors are insanely complex
> beasties, with multiple execution units allowing some internal
parallelism,
> subject to register contention and under very complex rules. Anyone who
> thinks they can do better optimizing within a small window is naive, or
> much, much better than the average run of programmer.

Though, when not targetting desktop-size processors; or targeting not quite
so standard processors, there is quite a lot to be won on hand-optimizing...

I've spent three periods at Ericsson Mobile Communications working extra
side to side with my studies, optimizing RSA under various circumstances.
The first time I went in, I worked on their implementation of WTLS --
specifically the hand-shaking part of the protocol -- and got a code snippet
with the kernel add-and-multiply loop that executed in 25 minutes (sic!) for
a standard handshake situation. Hand-optimizing the C-code yielded --
together with a minor algorithm change -- an execution-time on less than 1.5
seconds.

This is an extreme example, but still...

> Back when I was doing proof-of-principle for the DES crack, I spent a
*lot*
> of time optimizing DES code for the Pentium. While handoptimizing for
> that processor more than doubled the speed, the really big gains all
> came from a higher level understanding of the problem; in particular my
> insight on speeding up key schedule generation about 80x, and the
> perversion of the Pentium II MMX registers to run 'bitslice' (no, I didn't
> do
> that) algorithms, testing 64 keys in parallel.
>
> The optimizing compilers have generally exceeded human ability in
> low-level optimizing - not that that won't stop me from trying, now and
> then.
>
> BTW, the code used for the DES crackers bears about as much
> resemblence to regular DES code as a top-fuel dragster does to
> a Toyota Corolla - its tweaked to a fare-thee-well for one function,
> and totally useless for all others.
>
> Peter Trei

// Mikael Johansson




---------------------------------------------------------------------
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to majordomo at wasabisystems.com




More information about the cryptography mailing list