[Cryptography] I posted a memory-hard key stretching algorithm on github

Fri Jan 3 15:40:31 EST 2014

On 30 Dec 2013 15:05 Bill Cox wrote:

> It's at:
> 
> https://github.com/waywardgeek/keystretch
> 
> If this algorithm isn't too lame, I'll enter it in the password hashing
> competition in January.  There isn't much time for feedback or code
> development, so if you're interested in these algorithms, please let me
> know your thoughts on this one.  Essentially, I've upped the pre-hashing of
> the password to 4096 SHA-256 rounds, and replaced the memory hashing
> function of scrypt, Salsa20/8, with a simple hack that seems to run 8X
> faster while being unpredictable enough.
> 
> The only other entry I've read about so far is based on Blake2, which is a
> nice improvement over Salsa20, I think, but like scrypt, it spends most of
> it's time hashing rather than filling the memory bandwidth.  I'm not sure a
> cryptographically strong hash is called for, so I'm suggesting using a
> simpler hash that seems to work well enough.  Any thoughts welcome.

Sounds promising. A couple of suggestions:

1. Consider replacing your SHA-256 rounds with the SIMD-hash from the SHA-3 competition. SIMD made it to round 2, but was eliminated from the finals because it was the most expensive to implement in hardware. Small hardware area was a positive criteria in the SHA-3 competition, but it should be considered a negative for KDFs. Why not benefit from all the evaluation work done for SHA-3 candidates? See, e.g. "Comparing Hardware Performance of Fourteen Round Two SHA-3 Candidates Using FPGAs" Ekawat Homsirikamol, Marcin Rogawski, Kris Gaj. http://eprint.iacr.org/2010/445.pdf 

‎In particular Table 4.15 which compares the 14 Round Two candidates throughput-to-area ratio with SHA-512  shows SIMD being the lowest of the 14 hashes, which is the best outcome when we are designing a KBF.  The gain in software vs hardware difficulty is around a factor of 11 (3.5 bits), not insignificant for a simple software change.  I do not know how the 64-bit SIMD calculations play out in GPGPUs, but there should be some gain there as well. The ECHO hash was a close second. SIMD-256 is a little faster than SHA-256 in software, so your 4096 round count could b e increased if desired (perhaps the count should be a parameter). I also suggest going to a 512 bit hash since it requires more area. Why not? Finally, you might start with one round of SHA-2 or SHA-3, just to be able to say that any pre-image attack is blocked by a generally accepted standard hash.  

(I'd been thinking of offering PBKDF2-SIMD-512 in the KDF competition.)

2. I also thought of an intermediate approach to the question of scrypt revealing password information via memory side channels. It’s not an obvious big win, but it might be useful in some situations: 

Instead of using P, the password, in step 1 of scrypt, use:

   Q = Hash (P, Salt) modulo m

(If m=2^k, this just means taking the k low order bits of the hash.)

The parameter m limits how much information about P is recovered if some of the scrypt memory is compromised. If m is low, and Hash is fast, an attacker can devote one processor to each possible value of Q, having each processor calculate Q for every trial password and only proceeding if the Q is its assigned value. Each processor will only have to fill memory once. If Hash is resource consuming, e.g. PBKDF2-4096 as you propose, then the processors will have to expend m-times the work of the defender to screen trial passwords. The attacker can get around this by screening trial passwords centrally and only passing the ones with a given Q to the processor handling that Q, but this entails a lot more interprocessor communication. As m gets larger, each processor will have to handle more than one Q, though they can do so one at a time. If m is big enough, the attacker is forced to do a lot of memory exercise. So there is a tradeoff here between password information inserted in step 1 and resulting difficulty for the brute-force attacker. 

3. I think it is helpful to distinguish two use cases for KDFs: protecting login credentials and protecting cryptographic keys. In the case of a cryptographic key, the attacker who can glean KDF memory information most likely has easier access to some ciphertext generated with the key. Examples include whole disk encryption of a seized laptop or attacking WPA2 wireless encryption. The ciphertext itself is usually sufficient information to mount an off-line attack. So the gain from getting hold of the KDF memory, particularly after the algorithm is complete or half complete, is low.  On the other hand, login credentials can be further protected by limiting the number or rate of failed attempts. So any leaked information that allows an off-line attack is a big improvement for an attacker. For login credentials, it may be worth throttling the amount of information an attacker get from a memory compromise as described above, even if it means less KDF gain. Of course with common weak passwords, even 20 bits of leakage can be bad, but I see no reason to put the full entropy of a strong  login password into step 1. For cryptographic keys, any reduction in KDF gain may not be not worth it.

4. The SIMD-hash needs SIMD CPU instructions for full throughput, but these are available on modern x86 and ARM architectures. In the case of protecting cryptographic keys, there is less need for interoperability. If I am protecting my laptop's hard drive, I may as well use all the silicon in my CPU and not worry if the same algorithm will work efficiently on another architecture. We needn't design KDFs to the lowest common denominator.

Arnold Reinhold

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.metzdowd.com/pipermail/cryptography/attachments/20140103/42dadc49/attachment.html>