Coda Home Strategies

Implementation strategies

This page has suggestions for how to implement the best Groth16 SNARK prover (described here) to take home up to $75,000 in prizes.

Splitting computation between the CPU and GPU

The Groth16 prover consists of 4 G_1 multiexponentiations, 1 G_2 multiexponentiation, and 7 FFTs, as described here.

1 of the G_1 multiexponentiations cannot be computed until all of the FFTs. The other 3 G_1 multiexponentiations and the G_2 multiexponentiation however don't have any dependencies between each other or on any other computation.

So, some of them can be computed on the CPU while at the same time others are computed on the GPU. For example, you could first the FFTs first on the CPU, while simultaneously performing 2 of the G_1 multi-exponentiations on the GPU. After those completed, you could then compute the final G_1 multi-exponentiation on the CPU and the G_2 multi-exponentiation on the GPU.

Parallelism

Both the FFT and the multiexponentiations are massively parallelizable. The multiexponentiation in particular is an instance of a reduction: combining an array of values together using a binary operation.

Parallelism on the CPU

libsnark's "sub-libraries" libff and libfqfft implement parallelized multiexponentiation (code here) and FFT (code here) respectively.

Parallelism on the GPU

Check out this CUDA code which implements a parallel reduction in CUDA to sum up an array of 32-bit ints.

Field arithmetic

There is an excellent CUDA implementation of modular-multiplication using Montgomery representation here. Using that library to implement the field extension multiplication and curve-additions and then building a parallel reduction for curve-addition is likely an excellent path to creating a winning implementation of multi-exponentiation.

Optimizing curve arithmetic

Representation of points

There are many ways of representing curve points which yield efficiency improvements. Probably the best is Jacobian coordinates which allow for doubling points with 4 multiplications and adding points with 12 multiplications.

If some of the points are statically known, as in the case of an exponentiation, they can be represented in affine coordinates and one can take advantage of "mixed-addition". Mixed-addition allows you to add a point in affine cordinates to a point in Jacobian coordinates to obtain a point in Jacobian coordinates at a cost of 8 multiplications.

There are likely many other optimizations

Exponentiation algorithms

There are many techniques for speeding up exponentiation and multi-exponentiation.