In these tutorials, we will guide you through implementing elliptic curves, the basic building blocks for a SNARK prover on GPU.
By the end you’ll be well positioned to start swapping components of the reference CPU prover for GPU components and make your first speedups.
This tutorial breaks this down into individual steps, guiding you from some GPU starter code to an on-GPU elliptic curve implementation. So, even if you haven’t encountered elliptic curves before it will be straightforward to get started with a solution.
Our starter code uses CUDA, and you can use as a base for the first stage of the tutorial. If you know CUDA or another GPU programming language well you should feel free to start on your own too.
We intend for each of these stages to take at most a few hours, given some background in GPU computing. Feel free to contact us (brad@o1labs.org) or jump on our discord channel if you have any questions!
By the end of the tutorial stages, you’ll understand:
For each tutorial, there is a reference CPU solution your GPU program will be checked against.
For each, there will be some inputs, an operation you need to perform on them, and some outputs.
Your program can interpret inputs and format outputs in either standard numeric form, or it can interpret inputs / outputs in “montgomery” form. However, your program must perform the operation using montgomery arithmetic, as it is far more efficient than standard arithmetic. (the starter code already uses montgomery arithmetic, checkout the Wikipedia article here to learn more about it).
Your submissions will be run and evaluated as follows.
./build.sh
. This step should produce a binary ./main
./main compute inputs outputs
The tutorial starter code implements Field Arithmetic, interpreting inputs and outputs in standard numeric form and performing operations using montgomery arithmetic on the GPU.
Check out this file in the reference code for the exact I/O format and to see where the operations are being performed.
Field arithmetic takes non-negative numbers, and adds the concept of a "modulus." When you do an operation in field arithmetic, you mod the result by the modulus to get your result.
For example, if the modulus is "17," then:
10 + 10 = 20 mod 17 = 3
7*4 = 28 mod 17 = 11
84 = 84 mod 17 = 16
Division also takes on a slightly different meaning. An element times its inverse is still 1 (½ x 2 = 1). However, the inverse is itself an element on the field instead of a fraction. For example, for 2, 2^(-1) x 2 = 1, but 2^(-1) = 9. You can verify that 9 x 2 = 18 = 18 mod 17 = 1. Every element besides zero in the field has another element as its inverse. (for 17, [1,1], [2,9], [3,6], etc)
For our problem, our modulus is very large (753 bits!). The starter code represents uses bignums to accommodate this.
We also require doing arithmetic in montgomery form. The starter code already does this, but if you’re curious to learn more checkout the Wikipedia article here.
In this tutorial you will implement "Quadratic Extension" Arithmetic.
Check out this file in the reference code for the exact I/O format and to see where the operations are being performed.
Instead of multiplying field elements, we’ll now be multiplying elements in a “quadratic extension field”. This is similar to complex numbers, but for fields. Instead of having just a single field as above, we’ll now have 2 fields.
To multiply two elements in a “quadratic extension field”, you do the following arithmetic, where “fq_mul” is field multiplication, “fq_add” is field addition (see starter code), and “fq” is the field representation of a numeral. To learn more, checkout in-depth documentation here.
To multiply two elements in a "quadratic extension field", you do the following arithmetic, where fq_mul
is field multiplication, fq_add
is field addition (see starter code), and fq
is the field representation of a numeral. To learn more, checkout in depth documentation here.
var fq2_mul = (a, b) => {
var a0_b0 = fq_mul(a.a0, b.a0);
var a1_b1 = fq_mul(a.a1, b.a1);
var a1_b0 = fq_mul(a.a1, b.a0);
var a0_b1 = fq_mul(a.a0, b.a1);
return {
a0: fq_add(a0_b0, fq_mul(a1_b1, alpha)),
a1: fq_add(a1_b0, a0_b1)
};
};
var alpha = fq(13);
var fq2_add = (a, b) => {
return {
a: fq_add(a.a0, b.a0),
b: fq_add(a.a0, b.a0)
};
};
In this tutorial you will implement “Cubic Extension” Arithmetic. After this, you’ll be ready to implement operations on elliptic curves!
Checkout this file in the reference code for the exact I/O format and to see where the operations are being performed.
Similar to previous problem, except now instead of 2 fields, there are 3 fields! In depth documentation here, see pseudo code as follows:
var alpha = fq(11);
var fq3_add = (a, b) => {
return {
a0: fq_add(a.a0, b.a0),
a1: fq_add(a.a1, b.a1),
a2: fq_add(a.a2, b.a2)
};
};
var fq3_mul = (a, b) => {
var a0_b0 = fq_mul(a.a0, b.a0);
var a0_b1 = fq_mul(a.a0, b.a1);
var a0_b2 = fq_mul(a.a0, b.a2);
var a1_b0 = fq_mul(a.a1, b.a0);
var a1_b1 = fq_mul(a.a1, b.a1);
var a1_b2 = fq_mul(a.a1, b.a2);
var a2_b0 = fq_mul(a.a2, b.a0);
var a2_b1 = fq_mul(a.a2, b.a1);
var a2_b2 = fq_mul(a.a2, b.a2);
return {
a0: fq_add(a0_b0, fq_mul(alpha, fq_add(a1_b2, a2_b1))),
a1: fq_add(a0_b1, fq_add(a1_b0, fq_mul(alpha, a2_b2))),
a2: fq_add(a0_b2, fq_add(a1_b1, a2_b0))
};
};
We’re ready to implement elliptic curves! After this, you’ll be ready to apply these primitives to a faster snark prover.
Check out this file in the reference code for the exact I/O format and to see where the operations are being performed.
A single snark proving / verifying system is a pair of elliptic curves. We’ll call these (G1, G2). Because we are optimizing the snark prover for 2 snark proving / verifying systems (MNT4 and MNT6), we have 4 total curves:
We’ll use the following shorthand for the elements from the previous tutorial stages:
Fq
Fq2
Fq3
Each curve is specified by a pair of two of these elements, specifically:
Operations are defined on elements of the same type, so if you’re adding two curves from MNT4 G1, you’ll use field operations. If you’re adding elements from MNT6 G2, you’ll use cubic extension operations.
Say our two curves we’re adding are p, and q. Where p.x and p.y refer to the first and second elements of the curve.
In the usual case, you do:
var curve_add = (p, q) => {
var s = (p.y - q.y) / (p.x - q.x);
var x = s*s - p.x - q.x;
return {
x: x,
y: s*(p.x - x) - p.y
};
};
However, if p == q then you do a different algorithm called “doubling”.
var curve_double = (p, q) => {
var s = (p.x*p.x+p.x*p.x+p.x*p.x + coeff_a) / (p.y + p.y);
var x = s*s - p.x - q.x;
return {
x: x,
y: s*(p.x - x) - p.y
};
};
Curves have some additional parameters unique to each curve. coeff_a
is one of these parameters. See here for where coeff_a
is defined for MNT4_G1 (which happens to be just "2" for this curve). See sibling files in the reference repo for the coeff_a
of the other 3 curves.
See an optimized add function here for addition in MNT4 G1.
Read more about elliptic curve operations on Wikipedia here.
After you’ve finished the tutorial, you’ll have an on-GPU implementation of curve operations and you’ll be ready to start swapping components of the reference CPU prover for GPU components.
The first thing to do is to swap out is the prover’s multi-exponentiations (see prover reference here).
The multi-exponentiations can be seen as a giant map-reduce, as explained here. Move the “map” off CPU to take advantage of the parallelism offered by a GPU!
After that, some more hints on what to try:
For more information on the prover architecture, we suggest looking at the following