Skip to content

Commit 52a6edf

Browse files
authored
Workflow: Rerun from Activity (#80)
* Workflow: Rerun from Activity Signed-off-by: joshvanl <[email protected]> * Required `newInstanceID` Signed-off-by: joshvanl <[email protected]> * Adds note about rerun attempt indexing and parent instance ID Signed-off-by: joshvanl <[email protected]> --------- Signed-off-by: joshvanl <[email protected]>
1 parent 0a13058 commit 52a6edf

File tree

1 file changed

+148
-0
lines changed

1 file changed

+148
-0
lines changed
Lines changed: 148 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,148 @@
1+
# Workflow: Rerun from Activity
2+
3+
* Author(s): @joshvanl, @whitwaldo
4+
5+
## Overview
6+
7+
This proposal details the ability to rerun a workflow from a previous point in its history.
8+
A workflow in a terminal state can be rerun from a failed activity, before the failed activity, or at any activity in the history of a successful or failed workflow.
9+
10+
## Background
11+
12+
It is often the case that it's desirable to re-run business logic implemented inside a workflow.
13+
This could be because an activity in the workflow failed due to a transient error, an external dependency changed or a resource is now available to an activity, or it's just desirable for some subset of the last set of Activities to be rerun.
14+
Dapr should provide the functionality to rerun a workflow from any point in its history.
15+
16+
## Related Items
17+
18+
https://github.com/dapr/proposals/pull/79
19+
20+
## Design
21+
22+
The following proto RPC and messages will be added to durabletask, exposed via each SDK.
23+
This API implements rerunning a workflow from a specific Activity in the history of the workflow.
24+
The Activity to be rerun is chosen via its associated event ID.
25+
26+
All `Activties` are assigned an event ID in the durabletask history.
27+
While `Timers` and `RaiseEvents` tasks are also assigned an event ID in the durabletask history, the workflow cannot be rerun from these events using this API.
28+
Not only would supporting rerunning the workflow from these two event types require a significant code refactor, in practice, users are only interested in rerunning workflow from a specific _activity_, not a "control event".
29+
30+
It must be the case that the workflow is in a _terminal_ state before the rerun can be executed.
31+
This would be because the workflow has completed successfully, failed at some activity, or force terminated.
32+
Rerunning a workflow which is currently in progress does not make any practical or academic sense.
33+
Attempting to do so will return an error to the client.
34+
35+
When rerunning a workflow, the workflow will be started from the event ID of an Activity in the history.
36+
The client must give a _new_ input to the Activity to which the workflow will be rerun from.
37+
The workflow history up until the event ID of the Activity will be cloned.
38+
If defined, the activity will be started with the new input data.
39+
If no input is given, the activity will be started with the same input as the original workflow.
40+
41+
The client can optionally give a new instance ID to use when rerunning the workflow.
42+
This is useful for when the client wishes to preserve the history of the source workflow that is being rerun.
43+
`RerunWorkflowFromActivity` must have a new instance ID to clone the workflow up until the event ID from.
44+
45+
If the targeted `eventID` does not exist, or is not an Activity event, the API will return an error to the client.
46+
47+
```proto
48+
service TaskHubSidecarService {
49+
// Rerun a Workflow from a specific event ID from an activity.
50+
rpc RerunWorkflowFromActivity(RerunWorkflowFromActivityRequest) returns (RerunWorkflowFromActivityResponse);
51+
}
52+
53+
// RerunWorkflowFromActivityRequest is used to rerun a workflow instance from a
54+
// specific event ID.
55+
message RerunWorkflowFromActivityRequest {
56+
// instanceID is the orchestration instance ID to rerun.
57+
string instanceID = 1;
58+
59+
// the event id to start the new workflow instance from.
60+
int32 eventID = 2;
61+
62+
// newInstanceID is the new instance ID to use for the new workflow
63+
// instance.
64+
string newInstanceID = 3;
65+
66+
// input can optionally given to give the new instance a different input to
67+
// the next Activity event.
68+
google.protobuf.StringValue input = 4;
69+
}
70+
71+
// RerunWorkflowFromActivityResponse is the response to executing
72+
// RerunWorkflowFromActivity.
73+
message RerunWorkflowFromActivityResponse {
74+
string instanceId = 1;
75+
}
76+
```
77+
78+
The Orchestration protos will be updated to include a new `uint64 attempt` field which signals the attempt number which the current workflow is on.
79+
Starts from zero.
80+
Each rerun will increment the attempt number by one.
81+
The Orchestration protos will also include an optional `optional string rerunFrom` which will be set to the instance ID for which the workflow was rerun from.
82+
If the workflow is not created from a rerun, this field will be nil.
83+
84+
### Getting Instance History
85+
86+
As a compliment to the `RerunWorkflowFromActivity` API, a new API is added to get the history of run activities for a workflow instance.
87+
Note that the API returns _all_ history events for the workflow instance, including control events which do _not_ contain an event ID.
88+
This API is intended to be used for discovering the event ID of the activity to rerun from.
89+
The actor backend will get the instance history from the state store and return it to the client, using a new workflow Actor invoke method.
90+
91+
```proto
92+
service TaskHubSidecarService {
93+
// GetInstanceHistory retrieves the history of a workflow instance.
94+
rpc GetInstanceHistory(GetInstanceHistoryRequest) returns (GetInstanceHistoryResponse);
95+
}
96+
97+
// RerunWorkflowFromActivityResponse is the response to executing
98+
// RerunWorkflowFromActivity.
99+
message RerunWorkflowFromActivityResponse {
100+
string instanceId = 1;
101+
}
102+
103+
// GetInstanceHistoryRequest is used to get the history of a workflow instance.
104+
message GetInstanceHistoryRequest {
105+
// instanceID is the orchestration instance ID to get the history for.
106+
string instanceID = 1;
107+
}
108+
109+
// GetInstanceHistoryResponse is the response to executing
110+
// GetInstanceHistoryRequest.
111+
message GetInstanceHistoryResponse {
112+
repeated HistoryEvent events = 1;
113+
}
114+
```
115+
116+
### Concurrent Activities
117+
118+
It is often the case that workflow activities are run concurrently, i.e. in fan-out patterns.
119+
This means the resulting workflow history order of execution can be non-deterministic.
120+
The durabletask history is currently a linear sequence of events.
121+
This then means that rerunning a workflow from a specific Activity which is a member of a fan-out pattern will result in possible rerunning of peer fan-out activities, depending on the order of termination of Activities in the fan-out group.
122+
This may or may not be desirable to the user, but is otherwise a limitation of the API.
123+
124+
Users who wish to use this API in a regular fashion and in expected places should be advised to make use of "checkpoint" activities.
125+
These "checkpoint" activities should be a no-op- returning an output that is the same as the input.
126+
These checkpoint activities are useful as well-known activity event ID markers where the user knows it will be desirable to rerun the workflow regularly.
127+
128+
```go
129+
func checkpoint(ctx task.ActivityContext) (any, error) {
130+
var input int
131+
return input,ctx.GetInput(&input)
132+
}
133+
```
134+
135+
### SDK Changes
136+
137+
This API will be exposed on all durabletask SDKs.
138+
The semantics are generally dependant on the flavour of each SDK language, however-
139+
- The `instanceID` is a required string to target for the rerun.
140+
- The `eventID` is a required int32 to target the Activity for the rerun.
141+
- The `newInstanceID` is a required string, and reruns the workflow from the event ID of the Activity to the new instance ID.
142+
- An optional `input` which, if defined, will be used as the input to the targeted Activity when rerunning the workflow.
143+
144+
## Completion Checklist
145+
146+
* Implement proto & API changes to durabletask.
147+
* Update dapr workflow runtime to support the new APIs.
148+
* Update SDKs to support the new APIs.

0 commit comments

Comments
 (0)