-
Notifications
You must be signed in to change notification settings - Fork 146
chore(gas): add overview and history docs #3879
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
MasterPtato
wants to merge
1
commit into
01-13-fix_otel_enrich_http_traces
from
01-13-chore_gas_add_overview_and_history_docs
Closed
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file was deleted.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,3 @@ | ||
| # Design Guide | ||
|
|
||
| TODO: Common workflow patterns, systems layouts of multiple workflows |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,3 @@ | ||
| # Gotchas | ||
|
|
||
| TODO |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,3 @@ | ||
| # Operations | ||
|
|
||
| TODO |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,86 @@ | ||
| # Gasoline | ||
|
|
||
| Gasoline (at engine/packages/gasoline) is the durable execution engine running most persistent things on Rivet Engine. | ||
|
|
||
| Gasoline consists of: | ||
| - Workflows - Similar to the concept of actors. Can sleep (be removed from memory) when not in use | ||
| - Signals - Facilitates intercommunication between workflow <-> workflow and other services (such as api) -> workflow | ||
| - Messages - Ephemeral "fire-and-forget" communication between workflows -> other services | ||
| - Activities - Thin wrappers around native rust functions, each can be individually retried upon failure | ||
| - Operations - Thin wrappers around native rust functions. Provided for clean interop with the Gasoline ecosystem | ||
|
|
||
| ## Workflows | ||
|
|
||
| Workflows consist of a series of durable steps. When a step is complete, its result is saved to database as workflow history. If a workflow encounters a step which requires waiting (i.e. a signal, or just sleeping) it will be removed from memory until that event happens or the sleep times out. | ||
|
|
||
| When a workflow is rewoken from database back into memory, all previous steps that already completed will be "replayed". This means they will not actually execute code or write anything to database, but instead use the data already in the database to mimic an actual response. | ||
|
|
||
| ## Workflow Steps | ||
|
|
||
| All workflow steps are durable, meaning they are either NO-OPs or return the previously calculated result when the workflow replays. | ||
|
|
||
| These include: | ||
|
|
||
| - Signal - a durable packet of data sent to a workflow. Can have a timeout and ready batch size | ||
| - Message - an ephemeral packet of data sent from a workflow to a subscriber | ||
| - Activity - a thin wrapper around a bare function with automatic backoff retries | ||
| - Sub workflow - publishing another workflow and/or waiting for another workflow to complete before continuing | ||
| - Loop - allows running the same closure repeatedly while intelligently handling workflow history | ||
| - Sleep | ||
| - Removed events, version checks - advanced workflow history steps (see WORKFLOW_HISTORY.md) | ||
|
|
||
| You cannot run operations directly in the workflow body; they must be put in an activity first. | ||
|
|
||
| Some composition methods: | ||
|
|
||
| - Join - run multiple activities or closures in parallel | ||
| - Closure - create a branch in workflow history | ||
|
|
||
| ### Activities | ||
|
|
||
| Activities are used to run user code; they are the meat and potatoes of the workflow. They should be composed in a way such that their failure and subsequent retry does not cause any side effects. In other words, each activity should be limited to 1 "action" that, when retried, will not be clobbered by previous executions of the same activity. | ||
|
|
||
| Pseudocode (bad composition): | ||
|
|
||
| - Activity 1 | ||
| - Transaction 1: Insert user row into database | ||
| - Transaction 2: Insert user id into group table | ||
|
|
||
| If this activity were to fail on transaction 2, it will be retried with backoff. However, because the user row already exists, retrying will result in a database error. This activity will never succeed upon retry. | ||
|
|
||
| To remedy this, either: | ||
| - Combine the queries into one transaction OR | ||
| - Separate the transactions into 2 activities, one for each transaction | ||
|
|
||
| ### Signals | ||
|
|
||
| Signals are the only form of communication between services (anything outside of workflows) -> workflow, as well as workflow -> workflow. | ||
|
|
||
| To send a signal you need: | ||
|
|
||
| - A workflow ID OR | ||
| - Tags that match an existing incomplete workflow | ||
|
|
||
| If the workflow is found, the signal is added to the workflows queue. Workflows can consume signals by using a `listen` step. | ||
|
|
||
| If a workflow completes with pending signals still in its queue, the signals are marked as "acknowledged" and essentially forgotten. | ||
|
|
||
| ### Messages | ||
|
|
||
| Messages are ephemeral packets of data sent from workflows. They are intended as status updates for real time communication and not durable communication. It is ok for messages to not be consumed by any receiver. | ||
|
|
||
| Receiving a message requires subscribing to its name and a subset of its tags. | ||
|
|
||
| ## Tags | ||
|
|
||
| Workflows, signals, and messages all use tags for convenience when their unique IDs are not available. | ||
|
|
||
| Signals can be sent to a workflow if you know its name and a subset of its tags. | ||
|
|
||
| Internally, it is more efficient to order signal tags in a manner of most unique to least unique: | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this Gotchas worthy?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Definitely |
||
|
|
||
| - Given a workflow with tags: | ||
| - namespace = foo | ||
| - type = normal | ||
|
|
||
| The signal should be published with `namespace = foo` first, then `type = normal` | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,111 @@ | ||
| # Workflow History | ||
|
|
||
| To provide durable execution, workflows store their steps (aka events) in a database. These are stored in order of event location. | ||
|
|
||
| ## Location | ||
|
|
||
| All events have a __location__ consisting of a set of __coordinates__. Each __coordinate__ is a set of __ordinates__ which are positive integers. | ||
|
|
||
| Locations look like: | ||
|
|
||
| - `{1}` - the first event | ||
| - `{1, 4}` - the fourth child of the first event | ||
| - `{0.1}` - the first inserted event before the first event | ||
| - `{4, 0.3.1, 0.6}` - the 6th inserted event before the first event of the parent, which is the first inserted event between 0.3 and 0.4, which is a child of the fourth event in the root | ||
|
|
||
| This may look confusing, but it allows dynamic location assignment without prior knowledge of the entire list of steps of a workflow, as well as modifying an existing workflow which already has some history. | ||
|
|
||
| ## Calculating Location | ||
|
|
||
| Location is determined both by the events before it and the events after it (if the location is for an inserted event). | ||
|
|
||
| For a new workflow with no history, location is determined by incrementing the final coordinate (which consists of a single ordinate). Location `{2}` follows `{1}`, etc. | ||
|
|
||
| Coordinates start at `1`, not `0`. This is important to allow for inserting events before location `{1}` without negative numbers. | ||
|
|
||
| ### Branches | ||
|
|
||
| When a branch (used internally for steps like loops and closures) is encountered, a new coordinate is added to the current location. So events that are children of a branch at location `{1}` would start at `{1, 1}`. | ||
|
|
||
| ### Inserting Events | ||
|
|
||
| If a workflow has already executed up to location `{4}` but you want to add a new activity before location `{2}`, you can use __versioned workflow steps__ to make this happen. | ||
|
|
||
| By default all steps inherit the version of the branch they come from, which for the root of the workflow is version 1. | ||
|
|
||
| If you were to add a new step before location `{2}` with version 1 (denoted as `{2}v1` or `{2} v1`), the workflow would fail when it replays with the error `HistoryDiverged`. The version of inserted events must always be higher than the version of the step that comes AFTER it. | ||
|
|
||
| When we add a new step before location `{2}` with a version 2, it will be assigned the location `{1.1}` because it is the first inserted event after location `{1}`. All subsequent new events we add will increment this final ordinate: `{1.2}`, `{1.3}`, etc. | ||
|
|
||
| If we want to add an event between `{1.1}` and `{1.2}` (which are both version 2 events), we will need to set the event's version to 3. The new event's location will be `{1.1.1}`. | ||
|
|
||
| Events inserted before the event at location `{1}` will start with a 0 (`{0.1}`). Continuing to insert events before this new event will prepend more 0's: `{0.0.1}`, `{0.0.0.1}`, etc. | ||
|
|
||
| Note that inserting can be done at any root, so an event between `{2, 11, 4}` and `{2, 11, 5}` will have the location `{2, 11, 4.1}`. | ||
|
|
||
| ### Removing events | ||
|
|
||
| Removing events requires replacing the event with a `ctx.removed::<_>()` call. This is a durable step that will either: | ||
|
|
||
| - For workflows that have already executed the step that should be removed: will not manipulate the database but will skip the step when replaying. | ||
| - For workflows that have not executed the step yet: will insert a `removed` event into history | ||
|
|
||
| This keeps locations consistent between the two cases. | ||
|
|
||
| ### Inserting Events Conditionally Based On Version | ||
|
|
||
| Sometimes you may want to keep the history of existing workflows the same while modifying only new workflows. You can do this with `ctx.check_version(N)` where `N` is the version that will be used when the history does not exist yet (i.e. for a new workflow). | ||
|
|
||
| Given a workflow with the history: | ||
|
|
||
| - `{1}v1` activity foo | ||
| - `{2}v1` activity bar | ||
| - `{3}v1` sleep | ||
|
|
||
| If we want this workflow to remain the same but new workflows to execute a different activity instead of `bar` (perhaps a newer version), we can do: | ||
|
|
||
| ```rust | ||
| // Activity foo | ||
| ctx.activity(...).await?; | ||
|
|
||
| match ctx.check_version(2).await { | ||
| // The existing workflow will always match this path because the next event (activity bar) has version 1 | ||
| 1 => { | ||
| // Here we need to keep the workflow steps as expected by the history, run activity bar | ||
| ctx.activity(...).await?; | ||
| } | ||
| // This will be `2` because that is the value of `N` | ||
| _latest => { | ||
| // Activity bar_fast | ||
| ctx.v(2).activity(...).await?; | ||
| } | ||
| } | ||
|
|
||
| ctx.sleep().await | ||
| ``` | ||
|
|
||
| Version checks are durable because if history already exists at the location they are added then they do not manipulate the database and read the version of that history event. But if the version check is at the end of the current branch of events (as in a new workflow), it will be inserted as an event itself. This means the workflow history for both workflows will look like this: | ||
|
|
||
| - Existing workflow history: | ||
| - `{1}v1` activity foo | ||
| - `{2}v1` activity bar | ||
| - `{3}v1` sleep | ||
| - New workflow history: | ||
| - `{1}v1` activity foo | ||
| - `{2}v2` version check | ||
| - `{3}v2` activity bar_fast | ||
| - `{4}v1` sleep | ||
|
|
||
| Note that you can also manipulate the existing workflow history in the `1` branch just like you would without `check_version`. We could insert a new activity after activity `bar` with a `v2` or remove activity `bar`. | ||
|
|
||
| ## Loops | ||
|
|
||
| Loops structure event history with 2 nested branches: | ||
|
|
||
| A loop at location `{2}` will have each iteration on a separate branch: `{2, 1}`, `{2, 2}`, ... `{2, iteration}`. | ||
|
|
||
| Events in each iteration will be a child of the iteration branch: `{2, 2, 1}`, `{2, 2, 2}` are the first two events in the second iteration of the loop at location `{2}`. | ||
|
|
||
| Loops are often used to create state machines out of workflows. Because state machines can technically run forever based on their loop configuration, Gasoline moves all complete iteration's event history into a different place in the database known as forgotten event history. | ||
|
|
||
| Forgotten events will not be pulled from the database when the workflow is replayed. This will not cause issues for the workflow because we know which iteration is the current one and previous iterations should not influence the current history as each iteration is separate. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice how it explains a bit more of how it actually works. Might be good to say blatanlty how signals are the only form of durable/guaranteed communication and they’re consumption is guaranteed ( is it?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to say "only". Signal consumption is not guaranteed, the wf needs to listen to the signals to receive them