Skip to content

Commit 0c42579

Browse files
authored
Improve type safety and code quality (#51)
This PR is focused on type safety and code quality. In particular, it makes these changes: 1. The monolithic Task is now split into task phases. Methods now take their respective phases which removes the need for validation in many places. 2. Devices and Tasks are now cached in State, reducing communication with the DB. 3. Communicator is reworked - its only responsibility now is to relay protocol messages. 4. The global mutex around State is removed, allowing for concurrent request handling. 5. Async and locks are removed from tasks and moved to State. 6. Developer documentation is added in the form of an ARCHITECTURE.md document. 7. A lot of code is simplified and easier to reason about.
2 parents b437af6 + 1ba709e commit 0c42579

File tree

27 files changed

+1918
-2350
lines changed

27 files changed

+1918
-2350
lines changed

ARCHITECTURE.md

Lines changed: 99 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,99 @@
1+
# Architecture overview
2+
This repository contains a gRPC server which coordinates multi-party threshold protocols.
3+
4+
## Terminology
5+
- A **client** is a regular client in the gRPC sense.
6+
- A **device** is a **client** with an issued certificate. In some parts of the code, it might mean one **share** of a **participant** because those used to be synonymous.
7+
- A **task** is an abstracted communication among multiple **participants** with multiple phases and a result if it completes successfully.
8+
- A **participant** is a **device** and its **shares** within a **group** or a **task**.
9+
- A **share** is a unit of computation and voting power in the protocol layer and on the client side.
10+
- A **threshold** is the minimum number of accepting **shares** needed to start a **task**. It is mostly a property of a **group**, but is also used in the context of a **task**:
11+
- The **threshold** of a **group task** is the threshold of its resulting **group**.
12+
- The **threshold** of a **threshold task** is the threshold of the **group** within which it runs.
13+
- A **voting task** is the first task phase after a **task**'s creation where the **task participants** can either accept or reject the **task**. A participant stays a participant even if it rejects the task, it receives the task result if the task finishes successfully.
14+
- A **declined task** is a task phase which a **voting task** enters immediately once it's impossible to gather enough accepting votes.
15+
- A **running task** is a task phase which a **voting task** enters once it gathers enough accepting votes and a **communicator** is created. It gives control to a **protocol**.
16+
- A **failed task** is a task phase which is entered upon certain failures.
17+
- A **finished task** is a task phase which a **running task** enters once its **protocol** successfully computes a result.
18+
- A **communicator** is a bridge between **devices** and a **protocol**. It handles **message** gathering, relaying, broadcasting and **protocol** result collection.
19+
- A **message** is serialized data exchanged among **active shares**. All messages pass through the server. Depending on the origin, they are categorized into **client** and **server** messages. Relayed messages are also bundled into **server messages**.
20+
- A **group task** is a task which establishes a **group**. All its **participants** need to accept in order for the task to start.
21+
- A **group** is an abstraction over a shared key and a **threshold**.
22+
- A **threshold task** is a task which needs a **group** to be created and is an umbrella term for **sign**, **sign pdf** and **decrypt tasks**. In order for a threshold task to start, at least the **group**'s **threshold** number of **participants** need to accept.
23+
- A **protocol** is an abstraction for actual multi-party threshold protocols.
24+
- An **active share** is a **share** which participates in the computation of a **protocol**.
25+
- A **protocol index** is the index of an **active share** and corresponds to the way **protocols** manage their active parties. The assignment of indices to shares is rather complicated, see [*Protocol index assignment*](#protocol-index-assignment).
26+
27+
## Module structure
28+
- `persistence` contains code related to the server persistence.
29+
- `state` contains and manages all of the server's state.
30+
- `interfaces` contains the modules `grpc` and `timer`, which define long-running services
31+
- `grpc` provides the server's gRPC endpoints, handles client registration and certificates
32+
- `timer` periodically runs checks over the state
33+
- `task_store` manages the persistence and caching of tasks
34+
- `task` contains the logic for task computation
35+
- `protocol` contains the logic for protocol computation
36+
- `communicator` defines the communicator
37+
- `error` contains definitions of error variants
38+
- `utils` contains a few helper functions
39+
40+
## Persistence
41+
Most of the server state is persisted throughout server restarts, but some state is deliberately ephemeral and kept only in the RAM. The ephemeral state is mostly data which changes "rapidly", namely activity timestamps and messages exchanged during protocol computation.
42+
43+
Persistence is handled in the `state` module, with the exception of `task_store`, which is only used within `state`. This is to decouple the logic from bookkeeping.
44+
45+
The `persistence` module is supposed to be a "dumb" interface for communicating with the DB. In particular, it shouldn't validate data, perform complex logic, etc...
46+
47+
## State machines and state changes
48+
Much of the actual logic can be easily modeled using state machines. We use typestates to enforce valid state transitions. For example, a running task cannot change into a voting task.
49+
50+
Functions which update some state return a state change enum, which enforces handling of all possible situations and explicitly defines the logic. For example, saving a vote in a voting task can have three outcomes:
51+
- The task is accepted and transitions into a running task
52+
- The task is declined and transitions into a declined task
53+
- The task does not have enough votes to determine an outcome and it stays as a voting task
54+
55+
## Protocol index assignment
56+
Multiple shares per device were implemented in an ad-hoc way and devices do not understand share indices. Instead, they accept a vector of *k* messages, one per active share, and they implicitly assign indices *[0..(k-1)]* to the messages. The index assignment algorithm thus has to more or less work like this:
57+
1. Gather all candidate shares sorted by their corresponding device id - this lets us deterministically recover correct indices without persisting the index assignment.
58+
2. Assign indices *[0..n]* to the sorted shares.
59+
3. For each device, get the range of indices assigned to its shares.
60+
4. Choose the active shares such that for each device, they are chosen from the start of its range.
61+
62+
For example, consider a `3-of-1,2,3` setup with devices `A`, `B`, `C`.
63+
1. Gather sorted candidate shares: `[A, B, B, C, C, C]`
64+
2. Assign indices: `{0: A, 1: B, 2: B, 3: C, 4: C, 5: C}`
65+
3. Get the index ranges per device: `{A: [0..0], B: [1..2], C: [3..5]}`
66+
4. Choose 3 active shares from range beginnings: `{1: B, 3: C, 4: C}`
67+
68+
==> The protocol indices are thus `[1, 3, 4]`.
69+
70+
# Guides for common changes
71+
This section provides guides for certain changes to the codebase which may be common.
72+
73+
## Adding a new protocol type
74+
Adding new protocols must be coordinated with the `meesign-crypto` repository.
75+
76+
Protocols are defined throughout several places in the codebase:
77+
1. The `proto/meesign.proto` files in this and `meesign-crypto` repositories define a `ProtocolType` enumeration. Both must be extended.
78+
2. A few trait implementations for `ProtocolType` must be extended in `persistence/enums.rs`.
79+
3. The `protocol_type` enum must be extended in the DB migrations.
80+
4. A module should be added into `protocols`, similar to other protocols defined there.
81+
5. The module needs to create a type implementing the `Protocol` trait for each of its variants, for example `<protocol>Group`, `<protocol>Sign`, ...
82+
6. The module should use constants from `meesign_crypto::protocols::<protocol>`.
83+
84+
The overall structure should reflect the way other protocols are implemented. See the `protocols/frost.rs` module for example.
85+
86+
## Adding a new task type
87+
Adding new task types must be coordinated with the `meesign-client` repository.
88+
89+
If the task follows the usual task phases (voting, declined, running, failed, finished), then it should follow the structure of other task types already established in this repo. Otherwise, it must be handled exceptionally.
90+
91+
Here is a general process for when the new task type follows the usual task phases:
92+
1. The `proto/meesign.proto` file defines a `TaskType` enumeration. It must be extended.
93+
2. A few trait implementations for `TaskType` must be extended in `persistence/enums.rs`.
94+
3. The `task_type` enum must be extended in the DB migrations.
95+
4. The `RunningTaskContext` enum in `tasks/mod.rs` must be extended.
96+
5. A module should be added into `tasks`, similar to other tasks defined there.
97+
6. The module needs to create a type implementing the `RunningTask` trait.
98+
99+
The overall structure should reflect the way other tasks are implemented. See the `tasks/sign.rs` module for example.

migrations/2025-08-27-171451_initial_schema/down.sql

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
DROP TABLE active_task_participant;
12
DROP TABLE task_participant;
23
DROP TABLE task_result;
34
DROP TABLE task;

migrations/2025-08-27-171451_initial_schema/up.sql

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -84,3 +84,12 @@ CREATE TABLE task_participant (
8484
"acknowledgment" boolean,
8585
PRIMARY KEY ("task_id", "device_id")
8686
);
87+
88+
CREATE TABLE active_task_participant (
89+
"task_id" uuid NOT NULL REFERENCES task("id"),
90+
"device_id" bytea NOT NULL REFERENCES device("id"),
91+
"active_shares" integer NOT NULL CHECK ("active_shares" > 0),
92+
PRIMARY KEY ("task_id", "device_id"),
93+
FOREIGN KEY ("task_id", "device_id")
94+
REFERENCES task_participant("task_id", "device_id")
95+
);

0 commit comments

Comments
 (0)