Skip to content
Open
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
198 changes: 166 additions & 32 deletions docs/benchmarking.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,28 +4,137 @@ toc: true
mathjax: true
---

In this document we discuss your options when it comes to benchmarking code written using Cryptimeleon libraries as well as group operation counting provided by Cryptimeleon Math.
In this document we discuss your options when it comes to benchmarking code written using Cryptimeleon libraries.

# Runtime Benchmarking
There are many different kinds of possible performance metrics.
These include runtime in the form of CPU cycles or CPU time, memory usage, and network usage.
Furthermore, one may want to collect hardware-independent information such as the number of group operations or pairings.
Both of these types of metrics have their use-cases.
Counting group operations and pairings has the advantage of being hardware-independent.
To the layperson and potential user, applied metrics such as CPU time or memory usage can be more meaningful as they demonstrate practicality better than the more abstract group operations metrics.

## Lazy Eval
# Collecting Applied Metrics

Applied metrics, such as runtime or memory usage, can be collected using any existing Java benchmark framework.
An example of such a framework is the Java Microbenchmarking Harness (JMH).
It allows for very accurate measurements and integrates with many existing profilers.
Due to these existing options, we have decided against implementing any such capabilities ourselves.
Therefore, you will need to use one of these existing benchmark frameworks.

## Problems That Can Falsify Your Benchmarks

We now want to look at some potential problems when creating a benchmark.
We will illustrate these problems using the example of benchmarking the verification algorithm of a signature scheme.

### Lazy Evaluation

While useful for automatic optimization, the [lazy evaluation]({% link docs/lazy-eval.md %}) features of Math can create some benchmarking problems if not handled correctly.
To benchmark your verification algorithm, you will need valid signatures, and these signatures will be provided by executing the signing algorithm.
Inside the signing algorithm group operations might be executed.
By default, group operations in Cryptimeleon Math are evaluated lazily, i.e. deferred until needed.
Therefore, group operations done during signing will only be executed once the result is needed during verification of your signature.
This will increase the runtime of your verification algorithm and therefore falsify the results.
By serializing the `Signature` object using the `getRepresentation()` method *before* running verification, you can force all remaining computations involving fields of the `Signature` object to be completed in a blocking manner.

### Mutability

Another problem is the mutability of the `Signature` object.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know whether there's a better word for it. Like ... the value represented by the object never changes. It's not mutable in this sense. There are just some helper values that may be added that don't influence the value, just performance characteristics.

When repeatedly verifying the same `Signature` object, it may be possible that executing the verification algorithm leads to certain optimizations that change the runtime of the verification for the succeeding verification runs.
An example of this problem is the storing of *precomputations*.
The `Signature` object will most likely contain group elements.
Those group elements can store precomputations which help to make future exponentiations more efficient.
The first run of your verification may create such precomputations that can then be used in future invocations of the verification algorithm.
These latter runs will then run faster than the first one, leading to skewed results.

The underlying problem here is the mutability of the `Signature` object and the reuse of it for multiple runs of the verification algorithm.
Therefore, the solution is to recreate the `Signature` object for each invocation of the verification algorithm.
This can be done using our [representation framework]({% link docs/representations.md %}).
By serializing and then deserializing it for each invocation of the verification algorithm, you obtain a fresh `Signature` object each time with the same state as before serialization.

## JMH Example

As an example we look at how to use JMH to measure runtime for our implementation of the verification algorithm of the signature scheme from Section 4.2 of Pointcheval and Sanders [PS18].
Here we will also see how to implement the mitigations for the benchmarking problems discussed in the previous section.

Given the public key $$\textsf{pk} = (\tilde{g}, \tilde{X}, \tilde{Y}_1, \dots, \tilde{Y}_{r+1})$$, the message vector $$\textbf{m} = (m_1, \dots, m_r)$$, and the signature $$\sigma = (m', \sigma_1, \sigma_2)$$, the verification algorithm works as follows:
Check whether $$\sigma_1$$ is the multiplicative identity of $$\mathbb{G}_1$$ and output $$0$$ if yes.
Lastly, output $$1$$ if $$e(\sigma_1, \tilde{X} \cdot \prod_{j = 1}^r{\tilde{Y}_j^{m_j} \cdot \tilde{Y}_{r+1}^{m'}}) = e(\sigma_2, \tilde{g})$$ holds, and $$0$$ if not.
Here $$e$$ denotes the pairing operation.

We test the verification operation using messages of length one and of length 10.
```java
@State(Scope.Thread)
public class PS18VerifyBenchmark {

// Test with one message and ten
@Param({"1", "10"})
int numMessages;

PS18SignatureScheme scheme;
PlainText plainText;
Representation signatureRepr;
Representation verifyKeyRepr;

// The setup method that creates the signature and verification key
// used by the verify benchmark
@Setup(Level.Iteration)
public void setup() {
PSPublicParameters pp = new PSPublicParameters(new MclBilinearGroup());
scheme = new PS18SignatureScheme(pp);
SignatureKeyPair<? extends PS18VerificationKey, ? extends PS18SigningKey> keyPair =
scheme.generateKeyPair(numMessages);
RingElementPlainText[] messages = new RingElementPlainText[numMessages];
for (int i = 0; i < messages.length; i++) {
messages[i] = new RingElementPlainText(pp.getZp().getUniformlyRandomElement());
}
plainText = new MessageBlock(messages);
signature = scheme.sign(plainText, keyPair.getSigningKey());
verificationKey = keyPair.getVerificationKey();
// Computations in sign and/or key gen may be done non-blocking.
// To make sure these are not done as part of the verification benchmark,
// we force the remaining computations to be done blocking via getRepresentation()
signatureRepr = signature.getRepresentation();
verifyKeyRepr = verificationKey.getRepresentation();
}

// The benchmark method. Includes settings for JMH
@Benchmark
@BenchmarkMode(Mode.SingleShotTime)
@Warmup(iterations = 3, batchSize = 1)
@Measurement(iterations = 10, batchSize = 1)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
public Boolean measureVerify() {
// Running the signing or key gen algorithms may have done precomputations
// for the verification key and/or signature.
// To reset these precomputation such that they do not make the verification
// algorithm faster than it would be without, we recreate the objects
// without precomputations.
signature = scheme.restoreSignature(signatureRepr);
verificationKey = scheme.restoreVerificationKey(verifyKeyRepr);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could actually be discussed. This highly depends on what you want to benchmark. If you want to verify multiple signatures on the same key that you keep in memory, you should just use the same verificationKey object every time.

The current benchmark is "I get a list of unknown signatures/verification keys and need to verify them all".

return scheme.verify(plainText, signature, verificationKey);
}
}
```

JMH outputs the following results for the above benchmark (using an i5-4210m on Ubuntu 20.04):

Let's take for example a signature scheme.
During setup for your verification benchmark, you will probably execute the signing algorithm.
If evaluation is deferred until later, group operations done during signing will only be executed once the result is needed during verification of your signature.
This will obviously falsify your results for the verification benchmark.
|Benchmark | numMessages| Mode| Cnt | Score | Error| Units |
|-------------------|---------------------------|--------|----|-------|--------|-------|
|measureVerify | 1 | ss | 50 | 5.263 |$$\pm$$ 0.641 | ms/op |
|measureVerify | 10 | ss | 50| 8.157 |$$\pm$$ 1.506 | ms/op|

Therefore, you should make sure to compute all group operations via any of the methods that force a blocking evaluation of the `LazyGroupElement` instance, for example using `computeSync()`.
The above example is also part of our [Benchmark](https://github.com/cryptimeleon/benchmark) project.
There you can find more examples like it.

## Micro-Benchmarking
If you want to use JMH, we strongly recommend working through the [JMH Samples](https://hg.openjdk.java.net/code-tools/jmh/file/tip/jmh-samples/src/main/java/org/openjdk/jmh/samples/) to make sure you avoid any bad practices that could potentially invalidate your benchmarks.

If you care about accuracy, we recommend using a micro-benchmarking framework such as [JMH](https://openjdk.java.net/projects/code-tools/jmh/).
We also strongly recommend working through the [JMH Samples](https://hg.openjdk.java.net/code-tools/jmh/file/tip/jmh-samples/src/main/java/org/openjdk/jmh/samples/) to make sure you avoid any bad practices that could potentially invalidate your benchmarks.
## Using JMH with Gradle

[Our own benchmarks](https://github.com/cryptimeleon/benchmark) use JMH.
Since JMH is made to be used with Maven, you will probably want to add a Gradle task for executing your JMH tests (if you use Gradle).
To run the JMH tests, you need to run JMH's `Main` class with the classpath set to the folder where your tests lie.
The code below gives an example Gradle task that can run JMH tests inside the `jmh` source set.
It also enables support for certain JMH parameters.
For example, using the `include` parameter you can explicitly state the test classes you want to run.

```groovy
task jmh(type: JavaExec) {
Expand Down Expand Up @@ -53,24 +162,31 @@ task jmh(type: JavaExec) {
args '-rff', resultFile
}
```
Above is the script we use for our Cryptimeleon Benchmark project.
It allows us to use certain JMH parameters in addition to just running all tests contained in the `jmh` source set.

For example, to execute the previous JMH example (and only it), you would run
```bash
./gradlew -q jmh -Pinclude="PS18VerifyBenchmark"
```
inside the folder of your Gradle project.

# Group Operation Counting

Cryptimeleon Math includes capabilities for group operation counting.
Specifically, it allows for tracking group inversions, squarings, operations, as well as (multi-)exponentiations for a specific group.
The collection of hardware-independent metrics such as group operations is implemented by the Cryptimeleon Math library.
The main points of interest here are the `DebugBilinearGroup` and `DebugGroup` classes.
The former allows for counting pairings, and the latter allows for counting group operations, group squarings (relevant for elliptic curves), group inversions, exponentiations, and multi-exponentiations.
It is also able to track the number of times group elements have been serialized.

## DebugGroup

The functionality of group operation counting is provided by using a special group, the `DebugGroup`.
By simply using the `DebugGroup` to perform the computations, it automatically counts the operations done within it.

*Note: Keep in mind that `DebugGroup` uses \\(\mathbb{Z}_n\\) under the hood, and so is only to be used when testing and/or counting group operations, not for other performance benchmarks.*
*Note: Keep in mind that `DebugGroup` uses $$\mathbb{Z}_n$$ under the hood and is way faster than any secure group, and so is only to be used when testing and/or counting group operations, not for other performance benchmarks.*

```java
import org.cryptimeleon.math.structures.groups.debug.DebugGroup;

// instantiate the debug group with a name and its size
// Instantiate the debug group with a name and its size
DebugGroup debugGroup = new DebugGroup("DG1", 1000000);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One remark here: to get useful results, the (debug) group size should be consistent with whatever (actual) group you want to count for. Otherwise, we get wrong results (e.g., if the group is small, also exponentiations take way fewer group ops).


// Get a random non-neutral element and square it
Expand All @@ -84,18 +200,23 @@ System.out.println(debugGroup.getNumSquaringsTotal());
1
```

As seen above, `DebugGroup` provides the same interfaces as any other group in Math does, just with some additional features.

Whenever a group operation is performed, `DebugGroup` tracks it internally.
The user can access the data via a variety of methods.
These methods for data access can be separated in two categories:
Methods whose names end in `Total` and ones whose names end in `NoExpMultiExp`.
The former includes all group operations, even the ones done in (multi-)exponentiation algorithms while `NoExpMultiExp` methods only retrieve operation counts of operations *not* done in (multi-)exponentiations.
The counting is done in two modes: The `NoExpMultiExp` mode and the `Total` mode.
Group operations metrics from the `NoExpMultiExp` mode disregard operations done inside (multi-)exponentiations while the `Total` mode does account for operations inside (multi-)exponentiations.
`NoExpMultiExp` measurements are therefore independent of the actual (multi-)exponentiation algorithm while `Total` measurements are more expressive in regards to the actual runtime (since estimating group operation runtime is easier than that of a (multi-)exponentiation).
This is useful if you want to track (multi-)exponentiations only as a single unit and not the underlying group operations.
That data can be accessed via the `getNumExps()` and `getMultiExpTermNumbers()` methods, where the latter returns an array containing the number of bases in each multi-exponentiation done.

Exponentiation and multi-exponentiation data can be accessed via the `getNumExps()` and `getMultiExpTermNumbers()` methods, where the latter returns an array containing the number of bases in each multi-exponentiation done.
Additionally, `resetCounters()` can be used to reset all operation counters, and `formatCounterData()` provides a printable string that summarizes all collected data.

As an example we consider the computation of $$g^a \cdot h^b$$.
The `NoExpMultiExp` mode counts this as a single multi-exponentiation with two terms.
No group operations are counted since they are all part of the multi-exponentiation.
The `Total` mode does not consider the multi-exponentiation as its own unit.
Instead, it counts the group operations, inversions, and squarings that are part of evaluating the multi-exponentiation (using a wNAF-type algorithm).
Combining these metrics gives us therefore a more complete picture of the computational costs.

A more detailed (code) example is given below:

```java
DebugGroup debugGroup = new DebugGroup("DG1", 1000000);
GroupElement elem = debugGroup.getUniformlyRandomNonNeutral();
Expand Down Expand Up @@ -133,10 +254,18 @@ As you can see, the "Total group operation data" block has much higher numbers t

### Lazy Evaluation

`DebugGroup` does use lazy evaluation, meaning that `compute()` calls are necessary before retrieving tracked operation data, else the operation might have not been executed yet.
However, `compute()` has been changed to behave like `computeSync()` in that it blocks until the computation is done.
This is because non-blocking computation can lead to race conditions when printing the result of tracking the group operations, i.e. the computation has not been performed yet when the data is printed.
So make sure to always call `compute()` on every `DebugGroupElement` before accessing any counter data.
`DebugGroup` does use lazy evaluation, meaning that you need to ensure all lazy computations have finished before retrieving the tracked results.
One way to do this is to call `computeSync()` on all operations.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean... don't call it on all operations. Just the values you are interested in, not intermediate ones.

However, for your convenience, `DebugGroup` also overrides `compute()` to behave like `computeSync()` in that it blocks until the computation is done.
So make sure to always call `compute()` on every involved `DebugGroupElement` before accessing any counter data, or call `getRepresentation()` to serialize any involved objects as this also leads to a blocking computation.

### Configuring Used (Multi-)exponentiation Algorithm

`DebugGroup` makes use of efficient exponentiation and multi-exponentiation algorithms.
The exact algorithm used changes the resulting group operation counts.
To manually configure these algorithms, `DebugGroup` (and `DebugBilinearGroup`) offers setter and getter methods such as `getSelectedMultiExpAlgorithm` and `setSelectedMultiExpAlgorithm`.
Furthermore, you can configure the precomputation and exponentiation window sizes used for those algorithms.
These are the same methods as offered by `LazyGroup`.

### Serialization Tracking
`DebugGroup` not only allows for tracking group operations, it also counts how many calls of `getRepresentation()` have been called on elements of the group. This has the purpose of allowing you to track serializations.
Expand All @@ -145,7 +274,7 @@ The count is accessible via `getNumRetrievedRepresentations()`.
## DebugBilinearGroup

Cryptimeleon Math also provides a `BilinearGroup` implementation that can be used for counting, the `DebugBilinearGroup` class.
It uses a simple (not secure) \\(\mathbb{Z}_n\\) pairing.
It uses a simple (not secure) $$\mathbb{Z}_n$$ pairing.

In addition to the usual group operation counting done by the three `DebugGroup` instances contained in the bilinear group, `DebugBilinearGroup` also allows you to track number of pairings performed.

Expand All @@ -165,4 +294,9 @@ System.out.println(bilGroup.getNumPairings());
```
```
1
```
```


# References

[PS18] David Pointcheval and Olivier Sanders. “Reassessing Security of Randomizable Signatures”. In: Topic in Cryptology - CT-RSA 2018. Ed. by Nigel P. Smart. Springer International Publishing, 2018, pp 319-338.