[Cryptography] sha1sum speed

Henry Baker hbaker1 at pipeline.com
Sun May 1 20:33:02 EDT 2016


At 01:01 PM 5/1/2016, Bill Cox wrote:
>On Sun, May 1, 2016 at 8:00 AM, Henry Baker <hbaker1 at pipeline.com> wrote:
>sha1sum took 24 seconds.
>sha3sum (default algorithm) took 54 seconds.
>sha256sum took 54 seconds.
>b2sum-i686-linux took 35.7 seconds.
>b2sum-amd64-linux took 27.3 seconds.
>
>This shows a major problem we face in Linux distros: we like everyone to run the same binary, so everyone is forced to use the oldest supported CPU instruction set.
>
>The program sha1sum is from the coreutils package, which AFAICT contains zero vector-optimization of any kind. Here's what I get with my version of b2sum, compiled with AVX2 support, vs sha1sum shipping with Ubuntu.  randfile is a 300MiB file, already cached:
>
>$ time sha1sum randfile
>30b42c8894b108d65db90090c98c0a9c8cd63cb9 randfile
>
>real0m0.845s
>user0m0.784s
>sys0m0.056s
>
>$ time b2sum randfile
>e2cb7410dcbe11930909f144da7c2121f22100d7825614d640fa63e14a2da01265da779030a250e718ed30250221157992567d7cee4c4b4a28f77bcbbe4df514 randfile
>
>real0m0.432s
>user0m0.396s
>sys0m0.036s
>
>BLAKE2 is almost twice as fast, and the parallel version is faster (for large hashing, not < 1KiB):
>
>$ time b2sum -a blake2bp randfile
>4d33a9488a3a197a7179350b7c000296c231129679bc11ab024b11fda1f583cb957980e4c8e8cd6fb751ad3406842e54e7246675118d857342dbc8a60e4a84f2 randfile
>
>real0m0.324s
>user0m0.996s
>sys0m0.068s
>
>Not only that, but Samuel Neves (who wrote the optimized BLAKE2 code) has an optimized version of BLAKE2bp using more of the available parallelism per core to get around 1 byte/cycle throughput.

Thanks, Bill.  I compiled b2sum on my (old) Ubuntu system and am now getting 15 secs (compare with #'s above)), which is almost 2X the 27.3 seconds that I got with the single-threaded x86-64 code.

Note that '-a blake2bp' uses *all 3 cores.*



More information about the cryptography mailing list