Rijndael in Assembler for x86?

Trei, Peter ptrei at rsasecurity.com
Mon Sep 17 11:15:53 EDT 2001


> From:
> iang at abraham.cs.berkeley.edu[SMTP:iang at abraham.cs.berkeley.edu]
> 
> In article <87d74urezs.fsf at snark.piermont.com>,
> Perry E. Metzger <perry at piermont.com> wrote:
> >
> >Helger Lipmaa <helger at tcs.hut.fi> writes:
> 
> >> Why just not to use a C code?
> >
> >Because it is typically slower by many times than hand tuned assembler.
> 
> Are you sure?  For general code, that certainly hasn't been true in a
> long time; optimizing compilers nowadays can often do *better* then
> hand-coded assembler.  However, for encryption code in particular,
> I can imagine the C primitives (which usually lack rotate, etc.
> instructions) may be suboptimal.
> 
> That being said, back when I wrote the 40-bit RC5 breaker for the RSA
> challenge, I thought the same thing.  I figured I would first write a C
> version, and then tune the resulting assembler.  When I looked at what
> gcc had output, it had already done all the tricks I had in mind.
>    - Ian
> 
> [Moderator's note: The best DES implementations for i386s in assembler
> are several times faster than the best in C. I'm not sure about AES
> but I'd prefer to try and see. Perhaps it's a feature of DES's odd bit
> manipulation patterns, perhaps not. I have yet to see GCC produce code
> for almost anything that was just as fast as hand tuned assembler,
> though. --Perry]
> 
I'll chime in with Perry here - The newer processors are insanely complex
beasties, with multiple execution units allowing some internal parallelism,
subject to register contention and under very complex rules. Anyone who
thinks they can do better optimizing within a small window is naive, or
much, much better than the average run of programmer.

Back when I was doing proof-of-principle for the DES crack, I spent a *lot*
of time optimizing DES code for the Pentium. While handoptimizing for
that processor more than doubled the speed, the really big gains all 
came from a higher level understanding of the problem; in particular my 
insight on speeding up key schedule generation about 80x, and the 
perversion of the Pentium II MMX registers to run 'bitslice' (no, I didn't
do
that) algorithms, testing 64 keys in parallel. 

The optimizing compilers have generally exceeded human ability in 
low-level optimizing - not that that won't stop me from trying, now and 
then.

BTW, the code used for the DES crackers bears about as much
resemblence to regular DES code as a top-fuel dragster does to
a Toyota Corolla - its tweaked to a fare-thee-well for one function,
and totally useless for all others. 

Peter Trei







---------------------------------------------------------------------
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to majordomo at wasabisystems.com




More information about the cryptography mailing list