Skip to content

Commit 1257064

Browse files
Bueller87“Kevin”
andauthored
Edits: Go Client, Workflow Replay and Shadowing (#247)
* Add link to Codelab and grammar fixes * Clarify shadower scan options and Shadow worker caveats * All Github links changed to permalinks * Embed Youtube video at the top of Codelab * Rewrote Workflow Replayer section to document Replayer client API --------- Co-authored-by: “Kevin” <“[email protected]”>
1 parent f8d9b92 commit 1257064

File tree

2 files changed

+68
-89
lines changed

2 files changed

+68
-89
lines changed

docs/00-codelabs/01-workflow-tests-go-replayer-shadower.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,9 @@ permalink: /docs/codelabs/workflow-tests-go-replayer-shadower
66

77
# **Codelab: How to Write Tests With Workflow Replayer and Shadower**
88

9-
**A video companion to this Codelab is available on our YouTube channel [here](https://www.youtube.com/watch?v=LHOr0NOp0Gc).**
9+
**A video companion to this Codelab is available on our YouTube channel:**
10+
11+
<iframe width="560" height="315" src="https://www.youtube.com/embed/LHOr0NOp0Gc?si=iJB0TMfS5QbxWrn7" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
1012

1113
This Codelab is a step-by-step guide to help you create tests for your Cadence Workflows. By the end of this guide, you will be able to build a safety net for your workflow code, ensuring that changes you make don't break existing, long-running workflows.
1214

docs/05-go-client/19-workflow-replay-shadowing.md

Lines changed: 65 additions & 88 deletions
Original file line numberDiff line numberDiff line change
@@ -8,86 +8,56 @@ permalink: /docs/go-client/workflow-replay-shadowing
88

99
In the Versioning section, we mentioned that incompatible changes to workflow definition code could cause non-deterministic issues when processing workflow tasks if versioning is not done correctly. However, it may be hard for you to tell if a particular change is incompatible or not and whether versioning logic is needed. To help you identify incompatible changes and catch them before production traffic is impacted, we implemented Workflow Replayer and Workflow Shadower.
1010

11-
## Workflow Replayer
11+
## Hands-On Codelab
12+
**Ready for hands-on learning?** Follow our step-by-step [**Workflow Testing Codelab**](/docs/00-codelabs/01-workflow-tests-go-replayer-shadower.md) to build a complete testing setup from scratch.
1213

13-
Workflow Replayer is a testing component for replaying existing workflow histories against a workflow definition. The replaying logic is the same as the one used for processing workflow tasks, so if there's any incompatible changes in the workflow definition, the replay test will fail.
14+
**You'll learn:** Replayer setup • Shadower integration • Breaking change detection<br/>
15+
**Time commitment:** 30-45 minutes
1416

15-
### Write a Replay Test
17+
## Workflow Replayer
1618

17-
#### Step 1: Create workflow replayer
19+
Workflow Replayer is a testing component for replaying existing workflow histories against a workflow definition. The replaying logic is the same as the one used for processing workflow tasks, so if there are any incompatible changes in the workflow definition, the replay test will fail.
1820

19-
Create a workflow Replayer by:
21+
### Replay Options
2022

21-
```go
22-
replayer := worker.NewWorkflowReplayer()
23-
```
24-
or if custom data converter, context propagator, interceptor, etc. is used in your workflow:
23+
Complete documentation on replay options which includes default values, accepted values, etc. can be found [here](https://github.com/cadence-workflow/cadence-go-client/blob/master/internal/workflow_replayer.go). The following sections are just a brief description of each option.
2524

26-
```go
27-
options := worker.ReplayOptions{
28-
DataConverter: myDataConverter,
29-
ContextPropagators: []workflow.ContextPropagator{
30-
myContextPropagator,
31-
},
32-
WorkflowInterceptorChainFactories: []interceptors.WorkflowInterceptorFactory{
33-
myInterceptorFactory,
34-
},
35-
Tracer: myTracer,
36-
}
37-
replayer := worker.NewWorkflowReplayWithOptions(options)
38-
```
25+
#### Replayer Creation
26+
- **NewWorkflowReplayer()**: Default replayer constructor with standard configuration.
27+
- **NewWorkflowReplayWithOptions(ReplayOptions)**: Advanced constructor with customizable replay configuration.
3928

40-
#### Step 2: Register workflow definition
29+
#### ReplayOptions Fields
30+
- **DataConverter**: Custom data converter interface for workflow argument/result serialization.
31+
- **ContextPropagators**: Slice of context propagators for maintaining request context during replay.
32+
- **WorkflowInterceptorChainFactories**: Slice of interceptor factories for workflow execution middleware.
33+
- **Tracer**: OpenTracing tracer interface for distributed tracing support.
4134

42-
Next, register your workflow definitions as you normally do. Make sure workflows are registered the same way as they were when running and generating histories; otherwise the replay will not be able to find the corresponding definition.
35+
>⚠️ **Important:** Replay options must exactly match your production worker settings to ensure accurate replay results.
4336
44-
```go
45-
replayer.RegisterWorkflow(myWorkflowFunc1)
46-
replayer.RegisterWorkflow(myWorkflowFunc2, workflow.RegisterOptions{
47-
Name: workflowName,
48-
})
49-
```
37+
#### Registration Methods
38+
- **RegisterWorkflow(workflowFunc)**: Standard registration using Go function name as workflow type.
39+
- **RegisterWorkflow(workflowFunc, RegisterOptions)**: Registration with custom workflow name and additional options.
40+
41+
> ⚠️ **Critical:** All registration methods and options must exactly match those used during original workflow execution.
5042
51-
#### Step 3: Prepare workflow histories
43+
#### Replay Methods
44+
- **ReplayWorkflowHistory(logger, WorkflowHistory)**: Replay from pre-loaded workflow history object in memory.
45+
- **ReplayWorkflowHistoryFromJSONFile(logger, string)**: Replay from JSON file created by `cadence workflow show --of filename.json`.
46+
- **ReplayPartialWorkflowHistoryFromJSONFile(logger, string, int64)**: Replay partial history up to specified decision task event ID.
47+
- **ReplayWorkflowExecution(context, WorkflowServiceClient, logger, string, WorkflowExecution)**: Fetch and replay directly from Cadence server.
5248

53-
Replayer can read workflow history from a local json file or fetch it directly from the Cadence server. If you would like to use the first method, you can use the following CLI command, otherwise you can skip to the next step.
49+
#### Error Conditions
50+
- **Non-deterministic Changes**: Workflow code modifications that alter execution flow will cause replay failures.
51+
- **Insufficient History**: Minimum of 3 workflow events required for meaningful replay validation.
5452

53+
#### Downloading History
54+
Replayer can read workflow history from a local JSON file or fetch it directly from the Cadence server. If you would like to use the first method, you can use the following CLI command, otherwise you can skip to the next step.
5555
```bash
5656
cadence --do <domain> workflow show --wid <workflowID> --rid <runID> --of <output file name>
5757
```
58+
### Sample Unit Test
5859

59-
The dumped workflow history will be stored in the file at the path you specified in json format.
60-
61-
#### Step 4: Call the replay method
62-
63-
Once you have the workflow history or have the connection to Cadence server for fetching history, call one of the four replay methods to start the replay test.
64-
65-
```go
66-
// if workflow history has been loaded into memory
67-
err := replayer.ReplayWorkflowHistory(logger, history)
68-
69-
// if workflow history is stored in a json file
70-
err = replayer.ReplayWorkflowHistoryFromJSONFile(logger, jsonFileName)
71-
72-
// if workflow history is stored in a json file and you only want to replay part of it
73-
// NOTE: lastEventID can't be set arbitrarily. It must be the end of of a history events batch
74-
// when in doubt, set to the eventID of decisionTaskStarted events.
75-
err = replayer.ReplayPartialWorkflowHistoryFromJSONFile(logger, jsonFileName, lastEventID)
76-
77-
// if you want to fetch workflow history directly from cadence server
78-
// please check the Worker Service page for how to create a cadence service client
79-
err = replayer.ReplayWorkflowExecution(ctx, cadenceServiceClient, logger, domain, execution)
80-
```
81-
82-
#### Step 5: Check returned error
83-
84-
If an error is returned from the replay method, it means there's a incompatible change in the workflow definition and the error message will contain more information regarding where the non-deterministic error happens.
85-
86-
Note: currently an error will be returned if there are less than 3 events in the history. It is because the first 3 events in the history has nothing to do with the workflow code, so Replayer can't tell if there's a incompatible change or not.
87-
88-
### Sample Replay Test
89-
90-
This sample is also available in our samples repo at [here](https://github.com/cadence-workflow/cadence-samples/blob/master/cmd/samples/recipes/helloworld/replay_test.go#L39).
60+
This sample is also available in our samples repo [here](https://github.com/cadence-workflow/cadence-samples/blob/6350c61d16487d3a6cf9b31e3fac6967170c71ba/cmd/samples/recipes/helloworld/replay_test.go#L18).
9161

9262
```go
9363
func TestReplayWorkflowHistoryFromFile(t *testing.T) {
@@ -100,41 +70,45 @@ func TestReplayWorkflowHistoryFromFile(t *testing.T) {
10070

10171
## Workflow Shadower
10272

103-
Workflow Replayer works well when verifying the compatibility against a small number of workflow histories. If there are lots of workflows in production need to be verified, dumping all histories manually clearly won't work. Directly fetching histories from cadence server might be a solution, but the time to replay all workflow histories might be too long for a test.
73+
Workflow Replayer works well when verifying the compatibility against a small number of workflow histories. If there are a lot of workflows in production that need to be verified, dumping all histories manually clearly won't work. Directly fetching histories from the Cadence server might be a solution, but the time to replay all workflow histories might be too long for a test.
10474

105-
Workflow Shadower is built on top of Workflow Replayer to address this problem. The basic idea of shadowing is: scan workflows based on the filters you defined, fetch history for each of workflow in the scan result from Cadence server and run the replay test. It can be run either as a test to serve local development purpose or as a workflow in your worker to continuously replay production workflows.
75+
Workflow Shadower is built on top of Workflow Replayer to address this problem. The basic idea of shadowing is: scan workflows based on the filters you defined, fetch history for each of workflow in the scan result from Cadence server and run the replay test. It can be run either as a test to serve local development purposes or as a workflow in your worker to continuously replay production workflows.
10676

10777
### Shadow Options
10878

109-
Complete documentation on shadow options which includes default values, accepted values, etc. can be found [here](https://github.com/cadence-workflow/cadence-go-client/blob/master/internal/workflow_shadower.go#L53). The following sections are just a brief description of each option.
79+
Complete documentation on shadow options which includes default values, accepted values, etc. can be found [here](https://github.com/cadence-workflow/cadence-go-client/blob/2af19f25b056ce1039feaeabd3fb0e803d20010b/internal/workflow_shadower.go#L53). The following sections are just a brief description of each option.
11080

111-
#### Scan Filters
81+
#### Scan Filters: Advanced Query
82+
- **WorkflowQuery**: Use advanced visibility query syntax for complex filtering.
83+
- **SamplingRate**: Sampling workflows from the scan result before executing the replay test.
11284

113-
- WorkflowQuery: If you are familiar with our advanced visibility query syntax, you can specify a query directly. If specified, all other scan filters must be left empty.
114-
- WorkflowTypes: A list of workflow Type names.
115-
- WorkflowStatus: A list of workflow status.
116-
- WorkflowStartTimeFilter: Min and max timestamp for workflow start time.
117-
- SamplingRate: Sampling workflows from the scan result before executing the replay test.
85+
#### Scan Filters: Basic
86+
- **WorkflowTypes**: A list of workflow Type names.
87+
- **WorkflowStatus**: A list of workflow statuses. ([accepted values](https://github.com/cadence-workflow/cadence-go-client/blob/2af19f25b056ce1039feaeabd3fb0e803d20010b/internal/workflow_shadower.go#L72)) <br />*Note*: By default, an empty status list will only scan for "OPEN" workflows.
88+
- **WorkflowStartTimeFilter**: Min and max timestamp for workflow start time.
89+
- **SamplingRate**: Sampling workflows from the scan result before executing the replay test.
90+
91+
> ⚠️ **Compatibility Rule:** Use either WorkflowQuery OR the basic filters (WorkflowTypes/WorkflowStatus/WorkflowStartTimeFilter). SamplingRate works with both approaches.
11892
11993
#### Shadow Exit Condition
12094

121-
- ExpirationInterval: Shadowing will exit when the specified interval has passed.
122-
- ShadowCount: Shadowing will exit after this number of workflow has been replayed. Note: replay maybe skipped due to errors like can't fetch history, history too short, etc. Skipped workflows won't be taken account into ShadowCount.
95+
- **ExpirationInterval**: Shadowing will exit when the specified interval has passed.
96+
- **ShadowCount**: Shadowing will exit after this number of workflows have been replayed. Note: replay may be skipped due to errors like cannot fetch history, history too short, etc. Skipped workflows won't be taken into account in ShadowCount.
12397

12498
#### Shadow Mode
12599

126-
- Normal: Shadowing will complete after all workflows matches WorkflowQuery (after sampling) have been replayed or when exit condition is met.
127-
- Continuous: A new round of shadowing will be started after all workflows matches WorkflowQuery have been replayed. There will be a 5 min wait period between each round, and currently this wait period is not configurable. Shadowing will complete only when ExitCondition is met. ExitCondition must be specified when using this mode.
100+
- **Normal**: Shadowing will complete after all workflows that match WorkflowQuery (after sampling) have been replayed or when exit condition is met.
101+
- **Continuous**: A new round of shadowing will be started after all workflows that match WorkflowQuery have been replayed. There will be a 5-minute wait period between each round, and currently this wait period is not configurable. Shadowing will complete only when ExitCondition is met. ExitCondition must be specified when using this mode.
128102

129103
#### Shadow Concurrency
130104

131-
- Concurrency: workflow replay concurrency. If not specified, will be default to 1. For local shadowing, an error will be returned if a value higher than 1 is specified.
105+
- **Concurrency**: The default workflow replay concurrency is 1. Values greater than 1 only apply to a Shadowing Worker.
132106

133-
### Local Shadowing Test
107+
### Sample Integration Test
134108

135-
Local shadowing test is similar to the replay test. First create a workflow shadower with optional shadow and replay options, then register the workflow that need to be shadowed. Finally, call the `Run` method to start the shadowing. The method will return if shadowing has finished or any non-deterministic error is found.
109+
Local shadowing with the Workflow Shadower is similar to the replay test. First create a workflow shadower with optional shadow and replay options, then register the workflow that needs to be shadowed. Finally, call the `Run` method to start the shadowing. The method will return if shadowing has finished or any non-deterministic error is found.
136110

137-
Here's a simple example. The example is also available [here](https://github.com/cadence-workflow/cadence-samples/blob/master/cmd/samples/recipes/helloworld/shadow_test.go).
111+
Here's a simple example. The example is also available [here](https://github.com/cadence-workflow/cadence-samples/blob/6350c61d16487d3a6cf9b31e3fac6967170c71ba/cmd/samples/recipes/helloworld/shadow_test.go#L21).
138112

139113
```go
140114
func TestShadowWorkflow(t *testing.T) {
@@ -157,20 +131,23 @@ func TestShadowWorkflow(t *testing.T) {
157131
}
158132
```
159133

160-
### Shadowing Worker
134+
## Shadowing Worker
135+
161136

162-
NOTE:
163-
- **All shadow workflows are running in one Cadence system domain, and right now, every user domain can only have one shadow workflow at a time.**
164-
- **The Cadence server used for scanning and getting workflow history will also be the Cadence server for running your shadow workflow.** Currently, there's no way to specify different Cadence servers for hosting the shadowing workflow and scanning/fetching workflow.
137+
- **Each user domain is limited to one Shadowing Worker.**
138+
- **Each Shadowing Worker runs a single shadowing workflow in the "cadence-shadower" domain. You must create this domain before running a Shadowing Worker.**
139+
- **The Cadence server used for scanning and getting workflow history will also be the Cadence server for running your shadow workflow. Currently, there's no way to specify different Cadence servers for hosting the shadowing workflow and scanning/fetching workflow.**
165140

166-
Your worker can also be configured to run in shadow mode to run shadow tests as a workflow. This is useful if there's a number of workflows need to be replayed. Using a workflow can make sure the shadowing won't accidentally fail in the middle and the replay load can be distributed by deploying more shadow mode workers. It can also be incorporated into your deployment process to make sure there's no failed replay checks before deploying your change to production workers.
141+
Your worker can also be configured to run in shadow mode to run shadow tests as a workflow. This is useful if there are a number of workflows that need to be replayed. Using a workflow can make sure the shadowing won't accidentally fail in the middle and the replay load can be distributed by deploying more shadow mode workers. It can also be incorporated into your deployment process to make sure there's no failed replay checks before deploying your change to production workers.
167142

168143
When running in shadow mode, the normal decision, activity and session worker will be disabled so that it won't update any production workflows. A special shadow activity worker will be started to execute activities for scanning and replaying workflows. The actual shadow workflow logic is controlled by Cadence server and your worker is only responsible for scanning and replaying workflows.
169144

170-
[Replay succeed, skipped and failed metrics](https://github.com/cadence-workflow/cadence-go-client/blob/master/internal/common/metrics/constants.go#L105) will be emitted by your worker when executing the shadow workflow and you can monitor those metrics to see if there's any incompatible changes.
145+
[Replay succeed, skipped, and failed metrics](https://github.com/cadence-workflow/cadence-go-client/blob/654b9a72a6abb40317387c8d97b19d882d1aaa6c/internal/common/metrics/constants.go#L108-L111) will be emitted by your worker when executing the shadow workflow and you can monitor those metrics to see if there's any incompatible changes.
171146

172147
To enable the shadow mode, the only change needed is setting the `EnableShadowWorker` field in `worker.Options` to `true`, and then specify the `ShadowOptions`.
173148

174149
Registered workflows will be forwarded to the underlying WorkflowReplayer. DataConverter, WorkflowInterceptorChainFactories, ContextPropagators, and Tracer specified in the `worker.Options` will also be used as ReplayOptions. Since all shadow workflows are running in one system domain, to avoid conflict, **the actual task list name used will be `domain-tasklist`.**
175150

176-
A sample setup can be found [here](https://github.com/cadence-workflow/cadence-samples/blob/master/cmd/samples/recipes/helloworld/main.go#L24).
151+
### How to Set Up
152+
A sample of this setup can be found [here](https://github.com/cadence-workflow/cadence-samples/blob/6350c61d16487d3a6cf9b31e3fac6967170c71ba/cmd/samples/recipes/helloworld/main.go#L77).
153+

0 commit comments

Comments
 (0)