|
| 1 | +# Workflow History |
| 2 | + |
| 3 | +To provide durable execution, workflows store their steps (aka events) in a database. These are stored in order of event location. |
| 4 | + |
| 5 | +## Location |
| 6 | + |
| 7 | +All events have a __location__ consisting of a set of __coordinates__. Each __coordinate__ is a set of __ordinates__ which are positive integers. |
| 8 | + |
| 9 | +Locations look like: |
| 10 | + |
| 11 | +- `{1}` - the first event |
| 12 | +- `{1, 4}` - the fourth child of the first event |
| 13 | +- `{0.1}` - the first inserted event before the first event |
| 14 | +- `{4, 0.3.1, 0.6}` - the 6th inserted event before the first event of the parent, which is the first inserted event between 0.3 and 0.4, which is a child of the fourth event |
| 15 | + |
| 16 | +This may look confusing, but it allows dynamic location assignment without prior knowledge of the entire list of steps of a workflow, as well as modifying an existing workflow which already has some history. |
| 17 | + |
| 18 | +## Calculating Location |
| 19 | + |
| 20 | +Location is determined both by the events before it and the events after it (if the location is for an inserted event). |
| 21 | + |
| 22 | +For a new workflow with no history, location is determined by incrementing the final coordinate (which consists of a single ordinate). Location `{2}` follows `{1}`, etc. |
| 23 | + |
| 24 | +Coordinates start at `1`, not `0`. This is important to allow for inserting events before location `{1}` without negative numbers. |
| 25 | + |
| 26 | +### Branches |
| 27 | + |
| 28 | +When a branch (used internally for steps like loops and closures) is encountered, a new coordinate is added to the current location. So events that are children of a branch at location `{1}` would start at `{1, 1}`. |
| 29 | + |
| 30 | +### Inserting Events |
| 31 | + |
| 32 | +If a workflow has already executed up to location `{4}` but you want to add a new activity before location `{2}`, you can use __versioned workflow steps__ to make this happen. |
| 33 | + |
| 34 | +By default all steps inherit the version of the branch they come from, which for the root of the workflow is version 1. |
| 35 | + |
| 36 | +If you were to add a new step before location `{2}` with version 1 (denoted as `{2}v1` or `{2} v1`), the workflow would fail when it replays with the error `HistoryDiverged`. The version of inserted events must always be higher than the version of the step that comes AFTER it. |
| 37 | + |
| 38 | +When we add a new step before location `{2}` with a version 2, it will be assigned the location `{1.1}` because it is the first inserted event after location `{1}`. All subsequent new events we add will increment this final ordinate: `{1.2}`, `{1.3}`, etc. |
| 39 | + |
| 40 | +If we want to add an event between `{1.1}` and `{1.2}` (which are both version 2 events), we will need to set the event's version to 3. The new event's location will be `{1.1.1}`. |
| 41 | + |
| 42 | +Events inserted before the event at location `{1}` will start with a 0 (`{0.1}`). Continuing to insert events before this new event will prepend more 0's: `{0.0.1}`, `{0.0.0.1}`, etc. |
| 43 | + |
| 44 | +Note that inserting can be done at any root, so an event between `{2, 11, 4}` and `{2, 11, 5}` will have the location `{2, 11, 4.1}`. |
| 45 | + |
| 46 | +### Removing events |
| 47 | + |
| 48 | +Removing events requires replacing the event with a `ctx.removed::<_>()` call. This is a durable step that will either: |
| 49 | + |
| 50 | +- For workflows that have already executed the step that should be removed: will not manipulate the database but will skip the step when replaying. |
| 51 | +- For workflows that have not executed the step yet: will insert a `removed` event into history |
| 52 | + |
| 53 | +This keeps locations consistent between the two cases. |
| 54 | + |
| 55 | +### Inserting Events Conditionally Based On Version |
| 56 | + |
| 57 | +Sometimes you may want to keep the history of existing workflows the same while modifying only new workflows. You can do this with `ctx.check_version(N)` where `N` is the version that will be used when the history does not exist yet (i.e. for a new workflow). |
| 58 | + |
| 59 | +Given a workflow with the history: |
| 60 | + |
| 61 | +- `{1}v1` activity foo |
| 62 | +- `{2}v1` activity bar |
| 63 | +- `{3}v1` sleep |
| 64 | + |
| 65 | +If we want this workflow to remain the same but new workflows to execute a different activity instead of `bar` (perhaps a newer version), we can do: |
| 66 | + |
| 67 | +```rust |
| 68 | +// Activity foo |
| 69 | +ctx.activity(...).await?; |
| 70 | + |
| 71 | +match ctx.check_version(2).await { |
| 72 | + // The existing workflow will always match this path because the next event (activity bar) has version 1 |
| 73 | + 1 => { |
| 74 | + // Here we need to keep the workflow steps as expected by the history, run activity bar |
| 75 | + ctx.activity(...).await?; |
| 76 | + } |
| 77 | + // This will be `2` because that is the value of `N` |
| 78 | + _latest => { |
| 79 | + // Activity bar_fast |
| 80 | + ctx.v(2).activity(...).await?; |
| 81 | + } |
| 82 | +} |
| 83 | + |
| 84 | +ctx.sleep().await |
| 85 | +``` |
| 86 | + |
| 87 | +Version checks are durable because if history already exists at the location they are added then they do not manipulate the database and read the version of that history event. But if the version check is at the end of the current branch of events (as in a new workflow), it will be inserted as an event itself. This means the workflow history for both workflows will look like this: |
| 88 | + |
| 89 | +- Existing workflow history: |
| 90 | + - `{1}v1` activity foo |
| 91 | + - `{2}v1` activity bar |
| 92 | + - `{3}v1` sleep |
| 93 | +- New workflow history: |
| 94 | + - `{1}v1` activity foo |
| 95 | + - `{2}v2` version check |
| 96 | + - `{3}v2` activity bar_fast |
| 97 | + - `{4}v1` sleep |
| 98 | + |
| 99 | +Note that you can also manipulate the existing workflow history in the `1` branch just like you would without `check_version`. We could insert a new activity after activity `bar` with a `v2` or remove activity `bar`. |
| 100 | + |
| 101 | +## Loops |
| 102 | + |
| 103 | +Loops structure event history with 2 nested branches: |
| 104 | + |
| 105 | +A loop at location `{2}` will have each iteration on a separate branch: `{2, 1}`, `{2, 2}`, ... `{2, iteration}`. |
| 106 | + |
| 107 | +Events in each iteration will be a child of the iteration branch: `{2, 2, 1}`, `{2, 2, 2}` are the first two events in the second iteration of the loop at location `{2}`. |
| 108 | + |
| 109 | +Loops are often used to create state machines out of workflows. Because state machines can technically run forever based on their loop configuration, Gasoline moves all complete iteration's event history into a different place in the database known as forgotten event history. |
| 110 | + |
| 111 | +Forgotten events will not be pulled from the database when the workflow is replayed. This will not cause issues for the workflow because we know which iteration is the current one and previous iterations should not influence the current history as each iteration is separate. |
0 commit comments