[Cryptography] Are Momentum and Cuckoo Cycle PoW algorithms broken?

Bill Cox waywardgeek at gmail.com
Sat Jul 11 13:45:46 EDT 2015


On Fri, Jul 10, 2015 at 6:02 AM, Ryan Carboni <ryacko at gmail.com> wrote:

>   First, the SHA-512 hash fails to take
>> advantage of the new SHA-256 acceleration instructions.
>>
>>
> Are you going to be mining using Xeon processors?
>

Possibly.  However, I found a way to generate vastly more hash data, to
fill up the memory bandwidth of the CPU:

What I do now is generate a chain of cryptographic hashes (Blake2s), but
fill only 1 in 32 356-bit memory locations.  For the rest, I run a
non-cryptographic hash function to fill the spaces in between, seeded by
the prior Blake2s hash.  The filling function can be anything, it doesn't
even need to mix well.  A good choice would be the PWX-transform from the
Yescrypt PHC entry, with 2 rounds, as it's very fast and also provides
decent GPU defense, similar to Bcrypt.

With this scheme, I can easily fill the memory bandwidth while generating
hashes.  My own system at home will require 2 threads to fill the 25 GiB/s
pipe, and there is efficiency loss.  Anything over 20 GiB/s total would be
very good.  However, it is not clear that I will succeed at the collision
detection portion at this speed, though I am already doing it much faster
than the Momentum BitShare miner code that I downloaded.

Verification of hash collisions wont take much longer, but it will take
more memory.  Since only 1 in 32 hashes are cryptographically derived
directly from the block digest, you have to verify the entire path to the
hash from the cryptographically strong hash that was bound to the digest.
This means computing 1KiB of hash data for each of the two hashes
involved.  However, this step is faster than generating a single SHA512, so
it's not really a problem.

The reason filling bandwidth is important is that data transfer speed seems
to be the only real limiting factor an ASIC will face in computing Momentum
hashes.  It is not technically memory-hard because I can generate all the
hashes in parallel, and find collisions with a partial-sorting parallel
router.  True memory-hard algorithms all generate values sequentially in a
chain defeating parallel generation of the data.  However, the routing
required to find Momentum collisions seems to be expensive, and this is
what will keep an ASIC miner from completely pulling away in efficiency vs
CPUs.

Any Momentum implementation that uses a small fraction of the CPU memory
bandwidth will find an ASIC implementation gets that additional multiplier
pretty much for free.  ASICs also sometimes achieve memory bandwidths that
are very high, some even over 300GiB/s, more than 10X the bandwidth of my
CPU.  However, these are expensive power-hungry beasts which are very
difficult and expensive to design.  For comparison, a Sony Playstation,
IIRC, has 120GiB/s, with 8 GDDR 5 banks.  A system like that wont be cheap
or low-power.

So, in summary: It is critical to hash Momentum data fast!

Bill
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.metzdowd.com/pipermail/cryptography/attachments/20150711/0f83508b/attachment.html>


More information about the cryptography mailing list