|
| 1 | +# Workflow: Rerun from Activity |
| 2 | + |
| 3 | +* Author(s): @joshvanl, @whitwaldo |
| 4 | + |
| 5 | +## Overview |
| 6 | + |
| 7 | +This proposal details the ability to rerun a workflow from a previous point in its history. |
| 8 | +A workflow in a terminal state can be rerun from a failed activity, before the failed activity, or at any activity in the history of a successful or failed workflow. |
| 9 | + |
| 10 | +## Background |
| 11 | + |
| 12 | +It is often the case that it's desirable to re-run business logic implemented inside a workflow. |
| 13 | +This could be because an activity in the workflow failed due to a transient error, an external dependency changed or a resource is now available to an activity, or it's just desirable for some subset of the last set of Activities to be rerun. |
| 14 | +Dapr should provide the functionality to rerun a workflow from any point in its history. |
| 15 | + |
| 16 | +## Related Items |
| 17 | + |
| 18 | +https://github.com/dapr/proposals/pull/79 |
| 19 | + |
| 20 | +## Design |
| 21 | + |
| 22 | +The following proto RPC and messages will be added to durabletask, exposed via each SDK. |
| 23 | +This API implements rerunning a workflow from a specific Activity in the history of the workflow. |
| 24 | +The Activity to be rerun is chosen via its associated event ID. |
| 25 | + |
| 26 | +All `Activties` are assigned an event ID in the durabletask history. |
| 27 | +While `Timers` and `RaiseEvents` tasks are also assigned an event ID in the durabletask history, the workflow cannot be rerun from these events using this API. |
| 28 | +Not only would supporting rerunning the workflow from these two event types require a significant code refactor, in practice, users are only interested in rerunning workflow from a specific _activity_, not a "control event". |
| 29 | + |
| 30 | +It must be the case that the workflow is in a _terminal_ state before the rerun can be executed. |
| 31 | +This would be because the workflow has completed successfully, failed at some activity, or force terminated. |
| 32 | +Rerunning a workflow which is currently in progress does not make any practical or academic sense. |
| 33 | +Attempting to do so will return an error to the client. |
| 34 | + |
| 35 | +When rerunning a workflow, the workflow will be started from the event ID of an Activity in the history. |
| 36 | +The client must give a _new_ input to the Activity to which the workflow will be rerun from. |
| 37 | +The workflow history up until the event ID of the Activity will be cloned. |
| 38 | +If defined, the activity will be started with the new input data. |
| 39 | +If no input is given, the activity will be started with the same input as the original workflow. |
| 40 | + |
| 41 | +The client can optionally give a new instance ID to use when rerunning the workflow. |
| 42 | +This is useful for when the client wishes to preserve the history of the source workflow that is being rerun. |
| 43 | +`RerunWorkflowFromActivity` must have a new instance ID to clone the workflow up until the event ID from. |
| 44 | + |
| 45 | +If the targeted `eventID` does not exist, or is not an Activity event, the API will return an error to the client. |
| 46 | + |
| 47 | +```proto |
| 48 | +service TaskHubSidecarService { |
| 49 | + // Rerun a Workflow from a specific event ID from an activity. |
| 50 | + rpc RerunWorkflowFromActivity(RerunWorkflowFromActivityRequest) returns (RerunWorkflowFromActivityResponse); |
| 51 | +} |
| 52 | +
|
| 53 | +// RerunWorkflowFromActivityRequest is used to rerun a workflow instance from a |
| 54 | +// specific event ID. |
| 55 | +message RerunWorkflowFromActivityRequest { |
| 56 | + // instanceID is the orchestration instance ID to rerun. |
| 57 | + string instanceID = 1; |
| 58 | +
|
| 59 | + // the event id to start the new workflow instance from. |
| 60 | + int32 eventID = 2; |
| 61 | +
|
| 62 | + // newInstanceID is the new instance ID to use for the new workflow |
| 63 | + // instance. |
| 64 | + string newInstanceID = 3; |
| 65 | +
|
| 66 | + // input can optionally given to give the new instance a different input to |
| 67 | + // the next Activity event. |
| 68 | + google.protobuf.StringValue input = 4; |
| 69 | +} |
| 70 | +
|
| 71 | +// RerunWorkflowFromActivityResponse is the response to executing |
| 72 | +// RerunWorkflowFromActivity. |
| 73 | +message RerunWorkflowFromActivityResponse { |
| 74 | + string instanceId = 1; |
| 75 | +} |
| 76 | +``` |
| 77 | + |
| 78 | +The Orchestration protos will be updated to include a new `uint64 attempt` field which signals the attempt number which the current workflow is on. |
| 79 | +Starts from zero. |
| 80 | +Each rerun will increment the attempt number by one. |
| 81 | +The Orchestration protos will also include an optional `optional string rerunFrom` which will be set to the instance ID for which the workflow was rerun from. |
| 82 | +If the workflow is not created from a rerun, this field will be nil. |
| 83 | + |
| 84 | +### Getting Instance History |
| 85 | + |
| 86 | +As a compliment to the `RerunWorkflowFromActivity` API, a new API is added to get the history of run activities for a workflow instance. |
| 87 | +Note that the API returns _all_ history events for the workflow instance, including control events which do _not_ contain an event ID. |
| 88 | +This API is intended to be used for discovering the event ID of the activity to rerun from. |
| 89 | +The actor backend will get the instance history from the state store and return it to the client, using a new workflow Actor invoke method. |
| 90 | + |
| 91 | +```proto |
| 92 | +service TaskHubSidecarService { |
| 93 | + // GetInstanceHistory retrieves the history of a workflow instance. |
| 94 | + rpc GetInstanceHistory(GetInstanceHistoryRequest) returns (GetInstanceHistoryResponse); |
| 95 | +} |
| 96 | +
|
| 97 | +// RerunWorkflowFromActivityResponse is the response to executing |
| 98 | +// RerunWorkflowFromActivity. |
| 99 | +message RerunWorkflowFromActivityResponse { |
| 100 | + string instanceId = 1; |
| 101 | +} |
| 102 | +
|
| 103 | +// GetInstanceHistoryRequest is used to get the history of a workflow instance. |
| 104 | +message GetInstanceHistoryRequest { |
| 105 | + // instanceID is the orchestration instance ID to get the history for. |
| 106 | + string instanceID = 1; |
| 107 | +} |
| 108 | +
|
| 109 | +// GetInstanceHistoryResponse is the response to executing |
| 110 | +// GetInstanceHistoryRequest. |
| 111 | +message GetInstanceHistoryResponse { |
| 112 | + repeated HistoryEvent events = 1; |
| 113 | +} |
| 114 | +``` |
| 115 | + |
| 116 | +### Concurrent Activities |
| 117 | + |
| 118 | +It is often the case that workflow activities are run concurrently, i.e. in fan-out patterns. |
| 119 | +This means the resulting workflow history order of execution can be non-deterministic. |
| 120 | +The durabletask history is currently a linear sequence of events. |
| 121 | +This then means that rerunning a workflow from a specific Activity which is a member of a fan-out pattern will result in possible rerunning of peer fan-out activities, depending on the order of termination of Activities in the fan-out group. |
| 122 | +This may or may not be desirable to the user, but is otherwise a limitation of the API. |
| 123 | + |
| 124 | +Users who wish to use this API in a regular fashion and in expected places should be advised to make use of "checkpoint" activities. |
| 125 | +These "checkpoint" activities should be a no-op- returning an output that is the same as the input. |
| 126 | +These checkpoint activities are useful as well-known activity event ID markers where the user knows it will be desirable to rerun the workflow regularly. |
| 127 | + |
| 128 | +```go |
| 129 | +func checkpoint(ctx task.ActivityContext) (any, error) { |
| 130 | + var input int |
| 131 | + return input,ctx.GetInput(&input) |
| 132 | +} |
| 133 | +``` |
| 134 | + |
| 135 | +### SDK Changes |
| 136 | + |
| 137 | +This API will be exposed on all durabletask SDKs. |
| 138 | +The semantics are generally dependant on the flavour of each SDK language, however- |
| 139 | +- The `instanceID` is a required string to target for the rerun. |
| 140 | +- The `eventID` is a required int32 to target the Activity for the rerun. |
| 141 | +- The `newInstanceID` is a required string, and reruns the workflow from the event ID of the Activity to the new instance ID. |
| 142 | +- An optional `input` which, if defined, will be used as the input to the targeted Activity when rerunning the workflow. |
| 143 | + |
| 144 | +## Completion Checklist |
| 145 | + |
| 146 | +* Implement proto & API changes to durabletask. |
| 147 | +* Update dapr workflow runtime to support the new APIs. |
| 148 | +* Update SDKs to support the new APIs. |
0 commit comments