[Cryptography] Are Momentum and Cuckoo Cycle PoW algorithms broken?

Wed Jul 8 14:33:18 EDT 2015

On Wed, Jul 8, 2015 at 10:11 AM, John Tromp <john.tromp at gmail.com> wrote:

> > Is it known whether Momentum and/or Cuckoo are memory-hard?  I don't
> think
> > either are.
>
> As author of Cuckoo Cycle, let me address your concerns.
>
> > Both claim to be "memory-hard".
>
> The Cuckoo Cycle project page at https://github.com/tromp/cuckoo in fact
> lists a
>
> "Linear Time-Memory Trade-Off Bounty
>
> $500 for an open source implementation that uses at most N/k bits
> while running up to 15 k times slower, for any k>=2.
>

This bounty is for a memory-time trade-off.  I have not analyzed Cuckoo
Cycle for TMTO resistance, and only lightly analyzed it for memory*time
memory-hardness.  I've focused so far mainly on Momentum's memory-hardness.

Both of these bounties require N ranging over {2^28,2^30,2^32} and
> #threads ranging over {1,2,4,8}, and further assume a high-end Intel
> Core i7 or Xeon and recent gcc compiler with regular flags as in my
> Makefile."
>
> The whitepaper at
> https://github.com/tromp/cuckoo/blob/master/cuckoo.pdf?raw=true
> devotes a Section to Time-Memory Trade-Offs (TMTOs) that further
> details the memory hardness claims.
>
> Which you seem to conveniently ignore in your own notion of
> memory-hardness :-(
>

Memory hardness is different than TMTO-resistance.  I use the definitions
used by the Catena authors:

"Percival introduced the notion of sequential memory-hardness (SMH), which
is satisfied by his introduced password scrambler scrypt. Bases on this
notion, an algorithm is sequential memory-hard, if an adversary has no
computational advantage in using multiple CPUs, i.e., using b cores
requires b times the effort used for one core."

Your bounty requires that I use less total memory, which will make it hard
to find cycles fast.  A proper bounty for sequential-memory-hardness would
allow me to use N memory, and any number of CPUs to achieve a target
speedup, preferably an unbounded speed-up as the number of CPUs grows.
This is achieved against Momentum with a parallel sorting machine.

> Instead, what you're talking about is covered in the later Section
> "Parallelization", which show the gains achievable by using multiple
> threads. The algorithm you outline is also detailed in the earlier Section
> "Cuckoo Cyle basic algorithm" and is in fact what motivates the its name.
>
> It's almost as if you've gone out of your way to avoid reading the
> whitepaper:-)

I read it, but I am focusing primarily on Momentum first, which is far
simpler to analyze.  I see you have a mistake in your analysis of
Momentum's memory-hardness:

"Since the extreme case of L = 2 is so special, there is likely to be a
greater variety of algorithms that are more efficient than for the general
case. While we haven’t found (and don’t know of) a improved main algorithm,
we did find an improved BFS(L/2) TMTO algorithm (implemented in
momentomatum.cpp) that cuts the memory usage in half, resulting in a
slowdown of only 1.75—a lack of memory-hardness."

There is a trivial TMTO against Scrypt where for a large memory reduction
factor N, the runtime is increased by only slightly more than N/4.
Attackers often make use of this.  However, since the memory*time cost is
only reduced by at most a constant factor, Scrypt is still considered
memory hard.  I am confident this is not the case for Momentum, but not
because of your TMTO attack.

I also doubt Cuckoo Cycle is memory hard, though this needs further
analysis.

Can we focus on Momentum first, and see if there is something special about
Cuckoo Cycle that avoids Momentum's flaws?

> > Cuckoo Cycle also claims to be memory-hard, but I don't buy it.  They're
> > looking for cycles of length L by building a forest of trees, and
> detecting
> > when loops form, and measuring the length.  If we do a parallel sort as
> > above, we can store entries in a Cuckoo hash table
> > <https://en.wikipedia.org/wiki/Cuckoo_hashing> rapidly in parallel.
>
> You don't need parallel sort for that; the parallel version of the
> basic algorithm already does it. But note that:
>
> 1) it uses 32 times more memory than the Cuckoo Cycle reference algorithm
> 2) the parallel memory accesses are going to form a latency bottleneck when
>     the number of threads runs into the hundreds.
>
> regards,
> -John
>

My parallel sort algorithm uses the same amount of memory, and I already
took into account the latency bottleneck when the number of parallel
processing nodes runs into the thousands.

Cuckoo Cycle is a very cool agorithm, as is Momentum, but let's focus on
security against parallel attacks using custom ASICs, since this is what
we're trying to prevent as a crypto-coin PoW.

A major flaw I have not yet found any good solution for is the slowness of
cryptographic hashing algorithms.  I checked the runtime of SHA256 using
Intel's latest hardware instructions, and they just aren't fast enough
(multiple cycles per byte rather than the other way around).  I think an
ASIC attacker will gain a substantial advantage with far faster,
lower-power implementations.  I have not seen a PoW system with fast
verification yet that avoids this problem.  Until then, for ASIC
resistance, I think it is safer to stick with algorithms like Scrypt, or
even better, Yescrypt, and possibly Lyra2 or Argon2d.

Bill
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.metzdowd.com/pipermail/cryptography/attachments/20150708/ea493452/attachment.html>