Coda Home Prover Prover tutorials Problem 01: Field Arithmetic

Field arithmetic

Quick details

  • Problem: Use a GPU to multiply together arrays of elements of a prime-order field.
  • Prize:
    • Note: The prizes for this tutorial challenge have ended, but you can use the CUDA reference implementation here to easily get started on the next tutorial challenge.

Problem specification

Input

  • n : uint64
  • x0 : Array(𝔽MNT4753.q, n)

    The elements of x0 are represented using the Montgomery representation as described below.

  • x1 : Array(𝔽MNT4753.q, n)

    The elements of x1 are represented using the Montgomery representation as described below.

  • y0 : Array(𝔽MNT6753.q, n)

    The elements of y0 are represented using the Montgomery representation as described below.

  • y1 : Array(𝔽MNT6753.q, n)

    The elements of y1 are represented using the Montgomery representation as described below.

Output

Expected behavior

Your implementation should use one or both of the benchmark machine's GPUs to solve this problem. The machine's specifications can be found here.

The output out_x should have out_x[i] = x0[i] * x1[i]. where * is multiplication in the field 𝔽MNT4753.q.

The output out_y should have out_y[i] = y0[i] * y1[i] where * is multiplication in the field 𝔽MNT6753.q.

Submission guidelines

Your submission will be run and evaluated as follows.

  1. The submission runner will generate a random sequence of inputs, saved to a file PATH_TO_INPUTS.

  2. Your binary will be compiled with ./build.sh. This step should produce a binary ./main.

  3. Your binary will be invoked with

        ./main compute PATH_TO_INPUTS PATH_TO_OUTPUTS

    and its runtime will be recorded. The file PATH_TO_INPUTS will contain a sequence of inputs, each of which is of the form specified in the "Input" section.

    It should create a file called "outputs" at the path PATH_TO_OUTPUTS which contains a sequence of outputs, each of which is of the form specified in the "Output" section.

Reference implementation

The output of your submitted program will be checked against the reference implementation at this repo here. The "main" file is here. The core algorithm is implemented here.

Further discussion and background

The basic operations needed for the SNARK prover algorithm are multiplication and addition of integers.

Usually when programming we're used to working with 32-bit or 64-bit integers and addition and multiplication mod 2^{32} or 2^{64}) respectively.

For the SNARK prover though, the integers involved are a lot bigger. For our purposes, the integers are 753 bits and are represented using arrays of native integers. For example, we could represent them using an array of 12 64-bit integers (since 12 \cdot 64 = 768 > 753) or an array of 24 32-bit integers (since 24 \cdot 32 = 768 > 753). And instead of computing mod 2^{753}, we'll compute mod q where q is either MNT4753.q or MNT6753.q.

Each element of such an array is called a "limb". For example, we would say that we can represent a 2^{768} bit integer using 12 64-bit limbs.

We call the set of numbers 0, \dots, q - 1 by the name \mathbb{F}_q. This set forms a field, which means we can add, multiply, and divide numbers in the set. Addition and multiplication happen mod q. It might seem weird that we can divide, but it turns out for any a, b in \mathbb{F}_q, there is a number c in \mathbb{F}_q such that (c b) \mod q = a. In this case we say a / b = c so that (a / b) b \mod q = c b \mod q = a as you'd expect.

Montgomery representation

Montgomery representation is an alternative way of representing elements of \mathbb{F}_q so that multiplication mod q can be computed more efficiently.

Let q be one of MNT4753.q or MNT6753.q and let R = 2^{768}. The Montgomery representation of the nubmer x is (x R) \mod q. So for example, when q is MNT4753.q, the number 5 is represented as (5 \cdot 2^{768}) \mod q which happens to be

15141386232259939182423724568694911114488003694957216858820448966622494022908702997737632032507442391226452946698823665470952711443326537357991482811741996884665155234620507693793230633117754640516203527639390490866666926222409

This number then is represented as a little-endian length 12 array of 64-bit integers.

In summary, we represent the number x as an array a with

a.map((ai, i) => (2**i) * ai).sum() == (x * R) % q

Let us see how multplication works in this setting. We'll use pseudocode with % for \mathrm{mod}.

Given the Montgomery representation X = (x * R) % q of x and Y = (y * R) % q of y, we want to compute the Montgomery representation of (x * y) % q, which is (x * y * R) % q.

We have

X * Y
== ((x * R) % q) * ((y * R) % q)
== (x * y * R * R) % q

So, if we had a function div_R for letting us divide by R mod q, we would be able to compute the Montgomery representation of x * y from X and Y as div_R(X * Y).

To recap, to implement multiplication mod q, we need to implement two functions:

  1. Big integer multiplication, which takes two k limb integers and returns the 2 * k-limb integer which is their product.
  2. div_R, which takes a 2 * k-limb integer Z and returns the k-limb integer equal to (Z * r) % q, where r is the inverse of R. That is, the number with (R * r) % q = 1.

We will then have

div_R(X * Y)
== div_R((x * y * R * R) % q)
== (x * y * R * R * r) % q
== (x * y * R * 1) % q
== (x * y * R) % q

which is the Montgomery representation of the product of the inputs, exactly as we wanted.

Starter code

  • This repo has a CUDA implementation of this challenge.

Other resources

  • Algorithms for big-integer multiplication and div_R (often called Montgomery reduction) are given here, where our q is called m.
  • A C++ implementation of Montgomery reduction can be found here.
  • These slides may have useful insights for squeezing out extra performance.
  • This problem is sometimes called big integer multiplication, multi-precision multiplication, or more specifically "modular multiplication". You can find lots of great resources by searching for these terms along with "GPU".