Skip to content

Commit ea05c93

Browse files
committed
chore(gas): add overview and history docs
1 parent 5f2275d commit ea05c93

File tree

8 files changed

+208
-13
lines changed

8 files changed

+208
-13
lines changed

docs/engine/GASOLINE.md

Lines changed: 0 additions & 11 deletions
This file was deleted.
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# Design Guide
2+
3+
TODO: Common workflow patterns, systems layouts of multiple workflows

docs/engine/GASOLINE/GOTCHAS.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# Gotchas
2+
3+
TODO

docs/engine/GASOLINE/OPERATIONS.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# Operations
2+
3+
TODO

docs/engine/GASOLINE/OVERVIEW.md

Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,86 @@
1+
# Gasoline
2+
3+
Gasoline (at engine/packages/gasoline) is the durable execution engine running most persistent things on Rivet Engine.
4+
5+
Gasoline consists of:
6+
- Workflows - Similar to the concept of actors. Can sleep (be removed from memory) when not in use
7+
- Signals - Facilitates intercommunication between workflow <-> workflow and other services (such as api) -> workflow
8+
- Messages - Ephemeral "fire-and-forget" communication between workflows -> other services
9+
- Activities - Thin wrappers around native rust functions, each can be individually retried upon failure
10+
- Operations - Thin wrappers around native rust functions. Provided for clean interop with the Gasoline ecosystem
11+
12+
## Workflows
13+
14+
Workflows consist of a series of durable steps. When a step is complete, its result is saved to database as workflow history. If a workflow encounters a step which requires waiting (i.e. a signal, or just sleeping) it will be removed from memory until that event happens or the sleep times out.
15+
16+
When a workflow is rewoken from database back into memory, all previous steps that already completed will be "replayed". This means they will not actually execute code or write anything to database, but instead use the data already in the database to mimic an actual response.
17+
18+
## Workflow Steps
19+
20+
All workflow steps are durable, meaning they are either NO-OPs or return the previously calculated result when the workflow replays.
21+
22+
These include:
23+
24+
- Signal - a durable packet of data sent to a workflow. Can have a timeout and ready batch size
25+
- Message - an ephemeral packet of data sent from a workflow to a subscriber
26+
- Activity - a thin wrapper around a bare function with automatic backoff retries
27+
- Sub workflow - publishing another workflow and/or waiting for another workflow to complete before continuing
28+
- Loop - allows running the same closure repeatedly while intelligently handling workflow history
29+
- Sleep
30+
- Removed events, version checks - advanced workflow history steps (see WORKFLOW_HISTORY.md)
31+
32+
You cannot run operations directly in the workflow body; they must be put in an activity first.
33+
34+
Some composition methods:
35+
36+
- Join - run multiple activities or closures in parallel
37+
- Closure - create a branch in workflow history
38+
39+
### Activities
40+
41+
Activities are used to run user code; they are the meat and potatoes of the workflow. They should be composed in a way such that their failure and subsequent retry does not cause any side effects. In other words, each activity should be limited to 1 "action" that, when retried, will not be clobbered by previous executions of the same activity.
42+
43+
Pseudocode (bad composition):
44+
45+
- Activity 1
46+
- Transaction 1: Insert user row into database
47+
- Transaction 2: Insert user id into group table
48+
49+
If this activity were to fail on transaction 2, it will be retried with backoff. However, because the user row already exists, retrying will result in a database error. This activity will never succeed upon retry.
50+
51+
To remedy this, either:
52+
- Combine the queries into one transaction OR
53+
- Separate the transactions into 2 activities, one for each transaction
54+
55+
### Signals
56+
57+
Signals are the only form of communication between services (anything outside of workflows) -> workflow, as well as workflow -> workflow.
58+
59+
To send a signal you need:
60+
61+
- A workflow ID OR
62+
- Tags that match an existing incomplete workflow
63+
64+
If the workflow is found, the signal is added to the workflows queue. Workflows can consume signals by using a `listen` step.
65+
66+
If a workflow completes with pending signals still in its queue, the signals are marked as "acknowledged" and essentially forgotten.
67+
68+
### Messages
69+
70+
Messages are ephemeral packets of data sent from workflows. They are intended as status updates for real time communication and not durable communication. It is ok for messages to not be consumed by any receiver.
71+
72+
Receiving a message requires subscribing to its name and a subset of its tags.
73+
74+
## Tags
75+
76+
Workflows, signals, and messages all use tags for convenience when their unique IDs are not available.
77+
78+
Signals can be sent to a workflow if you know its name and a subset of its tags.
79+
80+
Internally, it is more efficient to order signal tags in a manner of most unique to least unique:
81+
82+
- Given a workflow with tags:
83+
- namespace = foo
84+
- type = normal
85+
86+
The signal should be published with `namespace = foo` first, then `type = normal`
Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
# Workflow History
2+
3+
To provide durable execution, workflows store their steps (aka events) in a database. These are stored in order of event location.
4+
5+
## Location
6+
7+
All events have a __location__ consisting of a set of __coordinates__. Each __coordinate__ is a set of __ordinates__ which are positive integers.
8+
9+
Locations look like:
10+
11+
- `{1}` - the first event
12+
- `{1, 4}` - the fourth child of the first event
13+
- `{0.1}` - the first inserted event before the first event
14+
- `{4, 0.3.1, 0.6}` - the 6th inserted event before the first event of the parent, which is the first inserted event between 0.3 and 0.4, which is a child of the fourth event
15+
16+
This may look confusing, but it allows dynamic location assignment without prior knowledge of the entire list of steps of a workflow, as well as modifying an existing workflow which already has some history.
17+
18+
## Calculating Location
19+
20+
Location is determined both by the events before it and the events after it (if the location is for an inserted event).
21+
22+
For a new workflow with no history, location is determined by incrementing the final coordinate (which consists of a single ordinate). Location `{2}` follows `{1}`, etc.
23+
24+
Coordinates start at `1`, not `0`. This is important to allow for inserting events before location `{1}` without negative numbers.
25+
26+
### Branches
27+
28+
When a branch (used internally for steps like loops and closures) is encountered, a new coordinate is added to the current location. So events that are children of a branch at location `{1}` would start at `{1, 1}`.
29+
30+
### Inserting Events
31+
32+
If a workflow has already executed up to location `{4}` but you want to add a new activity before location `{2}`, you can use __versioned workflow steps__ to make this happen.
33+
34+
By default all steps inherit the version of the branch they come from, which for the root of the workflow is version 1.
35+
36+
If you were to add a new step before location `{2}` with version 1 (denoted as `{2}v1` or `{2} v1`), the workflow would fail when it replays with the error `HistoryDiverged`. The version of inserted events must always be higher than the version of the step that comes AFTER it.
37+
38+
When we add a new step before location `{2}` with a version 2, it will be assigned the location `{1.1}` because it is the first inserted event after location `{1}`. All subsequent new events we add will increment this final ordinate: `{1.2}`, `{1.3}`, etc.
39+
40+
If we want to add an event between `{1.1}` and `{1.2}` (which are both version 2 events), we will need to set the event's version to 3. The new event's location will be `{1.1.1}`.
41+
42+
Events inserted before the event at location `{1}` will start with a 0 (`{0.1}`). Continuing to insert events before this new event will prepend more 0's: `{0.0.1}`, `{0.0.0.1}`, etc.
43+
44+
Note that inserting can be done at any root, so an event between `{2, 11, 4}` and `{2, 11, 5}` will have the location `{2, 11, 4.1}`.
45+
46+
### Removing events
47+
48+
Removing events requires replacing the event with a `ctx.removed::<_>()` call. This is a durable step that will either:
49+
50+
- For workflows that have already executed the step that should be removed: will not manipulate the database but will skip the step when replaying.
51+
- For workflows that have not executed the step yet: will insert a `removed` event into history
52+
53+
This keeps locations consistent between the two cases.
54+
55+
### Inserting Events Conditionally Based On Version
56+
57+
Sometimes you may want to keep the history of existing workflows the same while modifying only new workflows. You can do this with `ctx.check_version(N)` where `N` is the version that will be used when the history does not exist yet (i.e. for a new workflow).
58+
59+
Given a workflow with the history:
60+
61+
- `{1}v1` activity foo
62+
- `{2}v1` activity bar
63+
- `{3}v1` sleep
64+
65+
If we want this workflow to remain the same but new workflows to execute a different activity instead of `bar` (perhaps a newer version), we can do:
66+
67+
```rust
68+
// Activity foo
69+
ctx.activity(...).await?;
70+
71+
match ctx.check_version(2).await {
72+
// The existing workflow will always match this path because the next event (activity bar) has version 1
73+
1 => {
74+
// Here we need to keep the workflow steps as expected by the history, run activity bar
75+
ctx.activity(...).await?;
76+
}
77+
// This will be `2` because that is the value of `N`
78+
_latest => {
79+
// Activity bar_fast
80+
ctx.v(2).activity(...).await?;
81+
}
82+
}
83+
84+
ctx.sleep().await
85+
```
86+
87+
Version checks are durable because if history already exists at the location they are added then they do not manipulate the database and read the version of that history event. But if the version check is at the end of the current branch of events (as in a new workflow), it will be inserted as an event itself. This means the workflow history for both workflows will look like this:
88+
89+
- Existing workflow history:
90+
- `{1}v1` activity foo
91+
- `{2}v1` activity bar
92+
- `{3}v1` sleep
93+
- New workflow history:
94+
- `{1}v1` activity foo
95+
- `{2}v2` version check
96+
- `{3}v2` activity bar_fast
97+
- `{4}v1` sleep
98+
99+
Note that you can also manipulate the existing workflow history in the `1` branch just like you would without `check_version`. We could insert a new activity after activity `bar` with a `v2` or remove activity `bar`.
100+
101+
## Loops
102+
103+
Loops structure event history with 2 nested branches:
104+
105+
A loop at location `{2}` will have each iteration on a separate branch: `{2, 1}`, `{2, 2}`, ... `{2, iteration}`.
106+
107+
Events in each iteration will be a child of the iteration branch: `{2, 2, 1}`, `{2, 2, 2}` are the first two events in the second iteration of the loop at location `{2}`.
108+
109+
Loops are often used to create state machines out of workflows. Because state machines can technically run forever based on their loop configuration, Gasoline moves all complete iteration's event history into a different place in the database known as forgotten event history.
110+
111+
Forgotten events will not be pulled from the database when the workflow is replayed. This will not cause issues for the workflow because we know which iteration is the current one and previous iterations should not influence the current history as each iteration is separate.

docs/engine/HIBERNATING_WS.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,4 +22,4 @@ To facilitate state management on the runner side (specifically via RivetKit), e
2222

2323
When a client websocket closes during hibernation, this value is cleared.
2424

25-
When a runner receives a CommandStartActor message via the runner protocol, it contains information about which hiberating requests are still active.
25+
When a runner receives a CommandStartActor message via the runner protocol, it contains information about which hibernating requests are still active.

engine/packages/gasoline/src/db/kv/mod.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -968,7 +968,7 @@ impl Database for DatabaseKv {
968968
}
969969

970970
/// Returns the first incomplete workflow with the given name and tags, first meaning the one with the
971-
/// lowest id value (interpreted as u128) because its in a KV store. There is no way to get any other
971+
/// lowest id value (by internal representation) because its in a KV store. There is no way to get any other
972972
/// workflow besides the first.
973973
#[tracing::instrument(skip_all, fields(%workflow_name))]
974974
async fn find_workflow(

0 commit comments

Comments
 (0)