# [Cryptography] Is this parallel DPL solver for elliptic curves known?

Jonathan Katz jkatz at cs.umd.edu
Wed Jul 8 12:53:22 EDT 2015

```On Tue, Jul 7, 2015 at 9:03 PM, Bill Cox <waywardgeek at gmail.com> wrote:
> Sorry if this is just a matter of my relative ignorance of crypto...
>
> The algorithms I've found with my poor google-fu solve DLP over elliptic
> curves in time much worse than sqrt(q)/p, for a group of order q, running on
> p processes.  The obvious parallel algorithm is to cut the search space into
> p linear sections, and have p processors run the algorithm on it's
> sub-interval.  The computations and memory for each process is then
> sqrt(q/p), for a total of sqrt(p)*sqrt(q), far worse than the serial case.
>
> If instead, we could do this:
>
> 1) first, compute sqrt(q) baby step values in parallel, with total memory
> sqrt(q) and time sqrt(q)/p.  Store these to cheap disk storage
> 2) next, assign each node a range of the result space, so the first node is
> assigned hash values from 1 to q/p, etc
> 3) In a linear pass over the disk data, sort it into p lists according to
> it's assigned node
> 4) In p steps, using a barrel-shifter shaped router, route hashes to their
> destination nodes.  First, transmit the node0 data, then node1, etc.  This
> takes time proportional to q/p, and a barrel-shifter shaped router of depth
> log(p) and width p.  It is efficiently implemented using FPGAs, such as
> those from Achronix.
> 5) Now repeat steps 1-4 using the giant steps, rather than baby steps.  At
> the end, each node must search for identical hashes in the data it has
> locally.
> 6) Sort the data using a fast radix sort to find the discrete log
>
> Latency of the routing network is low enough to be ignored.  The bandwidth
> of the hard disks involved is the limiting factor.  The algorithm can be
> sped up so that instead of writing to local disk and partially sorting the
> data, we simply write to local RAM and transmit buffers at nearly disk-write
> speed to their destination, by having the barrel-shifter router cycle
> rapidly.  So, the first 2 passes have runtime O(sqrt(q)/p) each.  Radix sort
> of the resulting data is fast, but there will still be several passes over
> the disk.  This is the dominant step of the algorithm.  For example, if
> sorting 16-byte values using radix 256, it will take 16 passes.
>
> If I did the math correctly, the overall runtime is a small factor,
> O((sqrt(b/r) + 2)*sqrt(q)/p), where b is the bits in the hash, and r is the
> radix of the sorting step.  The speedup and total memory reduction is a
> factor of O(p^(3/2)/(sqrt(b/r) + 2)).
>
> This can be used for DLP, meet-in-the-middle parallel attacks, and other
> applications.
>
> Thanks,
> Bill

Parallel algorithms for the discrete logarithm problem have been
looked at before. A good starting point is the paper "Parallel
collision search with cryptanalytic applications" available here:
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.96.4820

> _______________________________________________
> The cryptography mailing list
> cryptography at metzdowd.com
> http://www.metzdowd.com/mailman/listinfo/cryptography
```