You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Edits: Go Client, Workflow Replay and Shadowing (#247)
* Add link to Codelab and grammar fixes
* Clarify shadower scan options and Shadow worker caveats
* All Github links changed to permalinks
* Embed Youtube video at the top of Codelab
* Rewrote Workflow Replayer section to document Replayer client API
---------
Co-authored-by: “Kevin” <“[email protected]”>
# **Codelab: How to Write Tests With Workflow Replayer and Shadower**
8
8
9
-
**A video companion to this Codelab is available on our YouTube channel [here](https://www.youtube.com/watch?v=LHOr0NOp0Gc).**
9
+
**A video companion to this Codelab is available on our YouTube channel:**
10
+
11
+
<iframewidth="560"height="315"src="https://www.youtube.com/embed/LHOr0NOp0Gc?si=iJB0TMfS5QbxWrn7"title="YouTube video player"frameborder="0"allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share"referrerpolicy="strict-origin-when-cross-origin"allowfullscreen></iframe>
10
12
11
13
This Codelab is a step-by-step guide to help you create tests for your Cadence Workflows. By the end of this guide, you will be able to build a safety net for your workflow code, ensuring that changes you make don't break existing, long-running workflows.
In the Versioning section, we mentioned that incompatible changes to workflow definition code could cause non-deterministic issues when processing workflow tasks if versioning is not done correctly. However, it may be hard for you to tell if a particular change is incompatible or not and whether versioning logic is needed. To help you identify incompatible changes and catch them before production traffic is impacted, we implemented Workflow Replayer and Workflow Shadower.
10
10
11
-
## Workflow Replayer
11
+
## Hands-On Codelab
12
+
**Ready for hands-on learning?** Follow our step-by-step [**Workflow Testing Codelab**](/docs/00-codelabs/01-workflow-tests-go-replayer-shadower.md) to build a complete testing setup from scratch.
12
13
13
-
Workflow Replayer is a testing component for replaying existing workflow histories against a workflow definition. The replaying logic is the same as the one used for processing workflow tasks, so if there's any incompatible changes in the workflow definition, the replay test will fail.
Workflow Replayer is a testing component for replaying existing workflow histories against a workflow definition. The replaying logic is the same as the one used for processing workflow tasks, so if there are any incompatible changes in the workflow definition, the replay test will fail.
18
20
19
-
Create a workflow Replayer by:
21
+
### Replay Options
20
22
21
-
```go
22
-
replayer:= worker.NewWorkflowReplayer()
23
-
```
24
-
or if custom data converter, context propagator, interceptor, etc. is used in your workflow:
23
+
Complete documentation on replay options which includes default values, accepted values, etc. can be found [here](https://github.com/cadence-workflow/cadence-go-client/blob/master/internal/workflow_replayer.go). The following sections are just a brief description of each option.
-**NewWorkflowReplayer()**: Default replayer constructor with standard configuration.
27
+
-**NewWorkflowReplayWithOptions(ReplayOptions)**: Advanced constructor with customizable replay configuration.
39
28
40
-
#### Step 2: Register workflow definition
29
+
#### ReplayOptions Fields
30
+
-**DataConverter**: Custom data converter interface for workflow argument/result serialization.
31
+
-**ContextPropagators**: Slice of context propagators for maintaining request context during replay.
32
+
-**WorkflowInterceptorChainFactories**: Slice of interceptor factories for workflow execution middleware.
33
+
-**Tracer**: OpenTracing tracer interface for distributed tracing support.
41
34
42
-
Next, register your workflow definitions as you normally do. Make sure workflows are registered the same way as they were when running and generating histories; otherwise the replay will not be able to find the corresponding definition.
35
+
>⚠️ **Important:** Replay options must exactly match your production worker settings to ensure accurate replay results.
-**RegisterWorkflow(workflowFunc)**: Standard registration using Go function name as workflow type.
39
+
-**RegisterWorkflow(workflowFunc, RegisterOptions)**: Registration with custom workflow name and additional options.
40
+
41
+
> ⚠️ **Critical:** All registration methods and options must exactly match those used during original workflow execution.
50
42
51
-
#### Step 3: Prepare workflow histories
43
+
#### Replay Methods
44
+
-**ReplayWorkflowHistory(logger, WorkflowHistory)**: Replay from pre-loaded workflow history object in memory.
45
+
-**ReplayWorkflowHistoryFromJSONFile(logger, string)**: Replay from JSON file created by `cadence workflow show --of filename.json`.
46
+
-**ReplayPartialWorkflowHistoryFromJSONFile(logger, string, int64)**: Replay partial history up to specified decision task event ID.
47
+
-**ReplayWorkflowExecution(context, WorkflowServiceClient, logger, string, WorkflowExecution)**: Fetch and replay directly from Cadence server.
52
48
53
-
Replayer can read workflow history from a local json file or fetch it directly from the Cadence server. If you would like to use the first method, you can use the following CLI command, otherwise you can skip to the next step.
49
+
#### Error Conditions
50
+
-**Non-deterministic Changes**: Workflow code modifications that alter execution flow will cause replay failures.
51
+
-**Insufficient History**: Minimum of 3 workflow events required for meaningful replay validation.
54
52
53
+
#### Downloading History
54
+
Replayer can read workflow history from a local JSON file or fetch it directly from the Cadence server. If you would like to use the first method, you can use the following CLI command, otherwise you can skip to the next step.
The dumped workflow history will be stored in the file at the path you specified in json format.
60
-
61
-
#### Step 4: Call the replay method
62
-
63
-
Once you have the workflow history or have the connection to Cadence server for fetching history, call one of the four replay methods to start the replay test.
64
-
65
-
```go
66
-
// if workflow history has been loaded into memory
If an error is returned from the replay method, it means there's a incompatible change in the workflow definition and the error message will contain more information regarding where the non-deterministic error happens.
85
-
86
-
Note: currently an error will be returned if there are less than 3 events in the history. It is because the first 3 events in the history has nothing to do with the workflow code, so Replayer can't tell if there's a incompatible change or not.
87
-
88
-
### Sample Replay Test
89
-
90
-
This sample is also available in our samples repo at [here](https://github.com/cadence-workflow/cadence-samples/blob/master/cmd/samples/recipes/helloworld/replay_test.go#L39).
60
+
This sample is also available in our samples repo [here](https://github.com/cadence-workflow/cadence-samples/blob/6350c61d16487d3a6cf9b31e3fac6967170c71ba/cmd/samples/recipes/helloworld/replay_test.go#L18).
Workflow Replayer works well when verifying the compatibility against a small number of workflow histories. If there are lots of workflows in production need to be verified, dumping all histories manually clearly won't work. Directly fetching histories from cadence server might be a solution, but the time to replay all workflow histories might be too long for a test.
73
+
Workflow Replayer works well when verifying the compatibility against a small number of workflow histories. If there are a lot of workflows in production that need to be verified, dumping all histories manually clearly won't work. Directly fetching histories from the Cadence server might be a solution, but the time to replay all workflow histories might be too long for a test.
104
74
105
-
Workflow Shadower is built on top of Workflow Replayer to address this problem. The basic idea of shadowing is: scan workflows based on the filters you defined, fetch history for each of workflow in the scan result from Cadence server and run the replay test. It can be run either as a test to serve local development purpose or as a workflow in your worker to continuously replay production workflows.
75
+
Workflow Shadower is built on top of Workflow Replayer to address this problem. The basic idea of shadowing is: scan workflows based on the filters you defined, fetch history for each of workflow in the scan result from Cadence server and run the replay test. It can be run either as a test to serve local development purposes or as a workflow in your worker to continuously replay production workflows.
106
76
107
77
### Shadow Options
108
78
109
-
Complete documentation on shadow options which includes default values, accepted values, etc. can be found [here](https://github.com/cadence-workflow/cadence-go-client/blob/master/internal/workflow_shadower.go#L53). The following sections are just a brief description of each option.
79
+
Complete documentation on shadow options which includes default values, accepted values, etc. can be found [here](https://github.com/cadence-workflow/cadence-go-client/blob/2af19f25b056ce1039feaeabd3fb0e803d20010b/internal/workflow_shadower.go#L53). The following sections are just a brief description of each option.
110
80
111
-
#### Scan Filters
81
+
#### Scan Filters: Advanced Query
82
+
-**WorkflowQuery**: Use advanced visibility query syntax for complex filtering.
83
+
-**SamplingRate**: Sampling workflows from the scan result before executing the replay test.
112
84
113
-
- WorkflowQuery: If you are familiar with our advanced visibility query syntax, you can specify a query directly. If specified, all other scan filters must be left empty.
114
-
- WorkflowTypes: A list of workflow Type names.
115
-
- WorkflowStatus: A list of workflow status.
116
-
- WorkflowStartTimeFilter: Min and max timestamp for workflow start time.
117
-
- SamplingRate: Sampling workflows from the scan result before executing the replay test.
85
+
#### Scan Filters: Basic
86
+
-**WorkflowTypes**: A list of workflow Type names.
87
+
-**WorkflowStatus**: A list of workflow statuses. ([accepted values](https://github.com/cadence-workflow/cadence-go-client/blob/2af19f25b056ce1039feaeabd3fb0e803d20010b/internal/workflow_shadower.go#L72)) <br />*Note*: By default, an empty status list will only scan for "OPEN" workflows.
88
+
-**WorkflowStartTimeFilter**: Min and max timestamp for workflow start time.
89
+
-**SamplingRate**: Sampling workflows from the scan result before executing the replay test.
90
+
91
+
> ⚠️ **Compatibility Rule:** Use either WorkflowQuery OR the basic filters (WorkflowTypes/WorkflowStatus/WorkflowStartTimeFilter). SamplingRate works with both approaches.
118
92
119
93
#### Shadow Exit Condition
120
94
121
-
- ExpirationInterval: Shadowing will exit when the specified interval has passed.
122
-
- ShadowCount: Shadowing will exit after this number of workflow has been replayed. Note: replay maybe skipped due to errors like can't fetch history, history too short, etc. Skipped workflows won't be taken account into ShadowCount.
95
+
-**ExpirationInterval**: Shadowing will exit when the specified interval has passed.
96
+
-**ShadowCount**: Shadowing will exit after this number of workflows have been replayed. Note: replay may be skipped due to errors like cannot fetch history, history too short, etc. Skipped workflows won't be taken into account in ShadowCount.
123
97
124
98
#### Shadow Mode
125
99
126
-
- Normal: Shadowing will complete after all workflows matches WorkflowQuery (after sampling) have been replayed or when exit condition is met.
127
-
- Continuous: A new round of shadowing will be started after all workflows matches WorkflowQuery have been replayed. There will be a 5 min wait period between each round, and currently this wait period is not configurable. Shadowing will complete only when ExitCondition is met. ExitCondition must be specified when using this mode.
100
+
-**Normal**: Shadowing will complete after all workflows that match WorkflowQuery (after sampling) have been replayed or when exit condition is met.
101
+
-**Continuous**: A new round of shadowing will be started after all workflows that match WorkflowQuery have been replayed. There will be a 5-minute wait period between each round, and currently this wait period is not configurable. Shadowing will complete only when ExitCondition is met. ExitCondition must be specified when using this mode.
128
102
129
103
#### Shadow Concurrency
130
104
131
-
- Concurrency: workflow replay concurrency. If not specified, will be default to 1. For local shadowing, an error will be returned if a value higher than 1 is specified.
105
+
-**Concurrency**: The default workflow replay concurrency is 1. Values greater than 1 only apply to a Shadowing Worker.
132
106
133
-
### Local Shadowing Test
107
+
### Sample Integration Test
134
108
135
-
Local shadowing test is similar to the replay test. First create a workflow shadower with optional shadow and replay options, then register the workflow that need to be shadowed. Finally, call the `Run` method to start the shadowing. The method will return if shadowing has finished or any non-deterministic error is found.
109
+
Local shadowing with the Workflow Shadower is similar to the replay test. First create a workflow shadower with optional shadow and replay options, then register the workflow that needs to be shadowed. Finally, call the `Run` method to start the shadowing. The method will return if shadowing has finished or any non-deterministic error is found.
136
110
137
-
Here's a simple example. The example is also available [here](https://github.com/cadence-workflow/cadence-samples/blob/master/cmd/samples/recipes/helloworld/shadow_test.go).
111
+
Here's a simple example. The example is also available [here](https://github.com/cadence-workflow/cadence-samples/blob/6350c61d16487d3a6cf9b31e3fac6967170c71ba/cmd/samples/recipes/helloworld/shadow_test.go#L21).
-**All shadow workflows are running in one Cadence system domain, and right now, every user domain can only have one shadow workflow at a time.**
164
-
-**The Cadence server used for scanning and getting workflow history will also be the Cadence server for running your shadow workflow.** Currently, there's no way to specify different Cadence servers for hosting the shadowing workflow and scanning/fetching workflow.
137
+
-**Each user domain is limited to one Shadowing Worker.**
138
+
-**Each Shadowing Worker runs a single shadowing workflow in the "cadence-shadower" domain. You must create this domain before running a Shadowing Worker.**
139
+
-**The Cadence server used for scanning and getting workflow history will also be the Cadence server for running your shadow workflow. Currently, there's no way to specify different Cadence servers for hosting the shadowing workflow and scanning/fetching workflow.**
165
140
166
-
Your worker can also be configured to run in shadow mode to run shadow tests as a workflow. This is useful if there's a number of workflows need to be replayed. Using a workflow can make sure the shadowing won't accidentally fail in the middle and the replay load can be distributed by deploying more shadow mode workers. It can also be incorporated into your deployment process to make sure there's no failed replay checks before deploying your change to production workers.
141
+
Your worker can also be configured to run in shadow mode to run shadow tests as a workflow. This is useful if there are a number of workflows that need to be replayed. Using a workflow can make sure the shadowing won't accidentally fail in the middle and the replay load can be distributed by deploying more shadow mode workers. It can also be incorporated into your deployment process to make sure there's no failed replay checks before deploying your change to production workers.
167
142
168
143
When running in shadow mode, the normal decision, activity and session worker will be disabled so that it won't update any production workflows. A special shadow activity worker will be started to execute activities for scanning and replaying workflows. The actual shadow workflow logic is controlled by Cadence server and your worker is only responsible for scanning and replaying workflows.
169
144
170
-
[Replay succeed, skipped and failed metrics](https://github.com/cadence-workflow/cadence-go-client/blob/master/internal/common/metrics/constants.go#L105) will be emitted by your worker when executing the shadow workflow and you can monitor those metrics to see if there's any incompatible changes.
145
+
[Replay succeed, skipped, and failed metrics](https://github.com/cadence-workflow/cadence-go-client/blob/654b9a72a6abb40317387c8d97b19d882d1aaa6c/internal/common/metrics/constants.go#L108-L111) will be emitted by your worker when executing the shadow workflow and you can monitor those metrics to see if there's any incompatible changes.
171
146
172
147
To enable the shadow mode, the only change needed is setting the `EnableShadowWorker` field in `worker.Options` to `true`, and then specify the `ShadowOptions`.
173
148
174
149
Registered workflows will be forwarded to the underlying WorkflowReplayer. DataConverter, WorkflowInterceptorChainFactories, ContextPropagators, and Tracer specified in the `worker.Options` will also be used as ReplayOptions. Since all shadow workflows are running in one system domain, to avoid conflict, **the actual task list name used will be `domain-tasklist`.**
175
150
176
-
A sample setup can be found [here](https://github.com/cadence-workflow/cadence-samples/blob/master/cmd/samples/recipes/helloworld/main.go#L24).
151
+
### How to Set Up
152
+
A sample of this setup can be found [here](https://github.com/cadence-workflow/cadence-samples/blob/6350c61d16487d3a6cf9b31e3fac6967170c71ba/cmd/samples/recipes/helloworld/main.go#L77).
0 commit comments