Skip to content

Commit 9c6399d

Browse files
authored
New docs section for canceled-context in workflows, other minor cleanups (#1134)
Canceled-context documentation was strewn about in individual function documentation, though it's a fairly important concept that needs to be understood. In particular, users are sometimes under the impression that "all things which accept a context will error", which can lead to nasty surprises like `selector.Select(ctx)`'s blocking behavior (causing a deadlocked workflow). Selector's docs have been improved in #1131, and this will hopefully make its behavior easier to learn before it is needed. And, since I noticed some small other things while writing this chunk of docs, I've rolled those changes into here as well.
1 parent 16337b2 commit 9c6399d

File tree

2 files changed

+114
-21
lines changed

2 files changed

+114
-21
lines changed

activity/doc.go

Lines changed: 17 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -65,9 +65,14 @@ parameter is the standard Go context.
6565
The second string parameter is a custom activity-specific parameter that can be used to pass in data into the activity
6666
on start. An activity can have one or more such parameters. All parameters to an activity function must be
6767
serializable, which essentially means that params can’t be channels, functions, variadic, or unsafe pointer.
68+
Exact details will depend on your DataConverter, but by default they must work with encoding/json.Marshal (and
69+
Unmarshal on the receiving side, which has the same limitations plus generally cannot deserialize into an interface).
6870
69-
The activity declares two return values: (string, error). The string return value is used to return the result of the
70-
activity. The error return value is used to indicate an error was encountered during execution.
71+
This activity declares two return values: (string, error). The string return value is used to return the result of the
72+
activity, and can be retrieved in the workflow with this activity's Future.
73+
The error return value is used to indicate an error was encountered during execution.
74+
Results must be serializable, like parameters, but only a single result value is allowed (i.e. you cannot return
75+
(string, string, error)).
7176
7277
Implementation
7378
@@ -77,8 +82,9 @@ constructs.
7782
7883
Failing the activity
7984
80-
To mark an activity as failed, all that needs to happen is for the activity function to return an error via the error
81-
return value.
85+
To mark an activity as failed, return an error from your activity function via the error return value.
86+
Note that failed activities do not record the non-error return's value: you cannot usefully return both a
87+
value and an error, only the error will be recorded.
8288
8389
Activity Heartbeating
8490
@@ -112,10 +118,13 @@ payload containing progress information.
112118
Activity Cancellation
113119
114120
When an activity is cancelled (or its workflow execution is completed or failed) the context passed into its function
115-
is cancelled which sets its Done channel’s closed state. So an activity can use that to perform any necessary cleanup
116-
and abort its execution. Currently cancellation is delivered only to activities that call RecordHeartbeat.
121+
is cancelled which closes its Done() channel. So an activity can use that to perform any necessary cleanup
122+
and abort its execution.
117123
118-
Async/Manual Activity Completion
124+
Currently, cancellation is delivered only to activities that call RecordHeartbeat. If heartbeating is not performed,
125+
the activity will continue to run normally, but fail to record its result when it completes.
126+
127+
Async and Manual Activity Completion
119128
120129
In certain scenarios completing an activity upon completion of its function is not possible or desirable.
121130
@@ -178,7 +187,7 @@ For a full example of implementing this pattern see the Expense sample.
178187
179188
Registration
180189
181-
In order to for some workflow execution to be able to invoke an activity type, the worker process needs to be aware of
190+
In order for a workflow to be able to execute an activity type, the worker process needs to be aware of
182191
all the implementations it has access to. An activity is registered with the following call:
183192
184193
activity.Register(SimpleActivity)

workflow/doc.go

Lines changed: 97 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -139,8 +139,92 @@ Time related functions:
139139
140140
Failing a Workflow
141141
142-
To mark a workflow as failed all that needs to happen is for the workflow function to return an error via the err
143-
return value.
142+
To mark a workflow as failed, return an error from your workflow function via the err return value.
143+
Note that failed workflows do not record the non-error return's value: you cannot usefully return both a
144+
value and an error, only the error will be recorded.
145+
146+
Ending a Workflow externally
147+
148+
Inside a workflow, to end you must finish your function by returning a result or error.
149+
150+
Externally, two tools exist to stop workflows from outside the workflow itself, by using the CLI or RPC client:
151+
cancellation and termination. Termination is forceful, cancellation allows a workflow to exit gracefully.
152+
153+
Workflows can also time out, based on their ExecutionStartToClose duration. A timeout behaves the same as
154+
termination (it is a hard deadline on the workflow), but a different close status and final event will be reported.
155+
156+
Terminating a Workflow
157+
158+
Terminating is roughly equivalent to using `kill -9` on a process - the workflow will be ended immediately,
159+
and no further decisions will be made. It cannot be prevented or delayed by the workflow, or by any configuration.
160+
Any in-progress decisions or activities will fail whenever they next communicate with Cadence's servers, i.e. when
161+
they complete or when they next heartbeat.
162+
163+
Because termination does not allow for any further code to be run, this also means your workflow has no
164+
chance to clean up after itself (e.g. running a cleanup Activity to adjust a database record).
165+
If you need to run additional logic when your workflow, use cancellation instead.
166+
167+
Canceling a Workflow
168+
169+
Canceling marks a workflow as canceled (this is a one-time, one-way operation), and immediately wakes the workflow
170+
up to process the cancellation (schedules a new decision task). When the workflow resumes after being canceled,
171+
the context that was passed into the workflow (and thus all derived contexts) will be canceled, which changes the
172+
behavior of many workflow.* functions.
173+
174+
Canceled workflow.Context behavior
175+
176+
A workflow's context can be canceled by either canceling the workflow, or calling the cancel-func returned from
177+
a worfklow.WithCancel(ctx) call. Both behave identically.
178+
179+
At any time, you can convert a canceled (or could-be-canceled) context into a non-canceled context by using
180+
workflow.NewDisconnectedContext. The resulting context will ignore cancellation from the context it is derived from.
181+
Disconnected contexts like this can be created before or after a context has been canceled, and it does not matter
182+
how the cancellation occurred.
183+
Because this context will not be canceled, this can be useful for using context cancellation as a way to request that
184+
some behavior be shut down, while allowing you to run cleanup logic in activities or elsewhere.
185+
186+
As a general guideline, doing anything with I/O with a canceled context (e.g. executing an activity, starting a
187+
child workflow, sleeping) will fail rather than cause external changes. Detailed descriptions are available in
188+
documentation on functions that change their behavior with a canceled context; if it does not mention canceled-context
189+
behavior, its behavior does not change.
190+
For exact behavior, make sure to read the documentation on functions that you are calling.
191+
192+
As an incomplete summary, these actions will all fail immediately, and the associated error returns (possibly within
193+
a Future) will be a workflow.CanceledError:
194+
195+
- workflow.Await
196+
- workflow.Sleep
197+
- workflow.Timer
198+
199+
Child workflows will:
200+
201+
- ExecuteChildWorkflow will synchronously fail with a CanceledError if canceled before it is called
202+
(in v0.18.4 and newer. See https://github.com/uber-go/cadence-client/pull/1138 for details.)
203+
- be canceled if the child workflow is running
204+
- wait to complete their future.Get until the child returns, and the future will contain the final result
205+
(which may be anything that was returned, not necessarily a CanceledError)
206+
207+
Activities have configurable cancellation behavior. For workflow.ExecuteActivity and workflow.ExecuteLocalActivity,
208+
see the activity package's documentation for details. In summary though:
209+
210+
- ExecuteActivity will synchronously fail with a CanceledError if canceled before it is called
211+
- the activity's future.Get will by default return a CanceledError immediately when canceled,
212+
unless activityoptions.WaitForCancellation is true
213+
- the activity's context will be canceled at the next heartbeat event, or not at all if that does not occur
214+
215+
And actions like this will be completely unaffected:
216+
217+
- future.Get
218+
(futures derived from the calls above may return a CanceledError, but this is not guaranteed for all futures)
219+
- selector.Select
220+
(Select is completely unaffected, similar to a native select statement. if you wish to unblock when your
221+
context is canceled, consider using an AddReceive with the context's Done() channel, as with a native select)
222+
- channel.Send, channel.Receive, and channel.ReceiveAsync
223+
(similar to native chan read/write operations, use a selector to wait for send/receive or some other action)
224+
- workflow.Go
225+
(the context argument in the callback is derived and may be canceled, but this does not stop the goroutine,
226+
nor stop new ones from being started)
227+
- workflow.GetVersion, workflow.GetLogger, workflow.GetMetricsScope, workflow.Now, many others
144228
145229
Execute Activity
146230
@@ -286,14 +370,14 @@ pattern, extra care needs to be taken to ensure the child workflow is started be
286370
Error Handling
287371
288372
Activities and child workflows can fail. You could handle errors differently based on different error cases. If the
289-
activity returns an error as errors.New() or fmt.Errorf(), those errors will be converted to error.GenericError. If the
290-
activity returns an error as error.NewCustomError("err-reason", details), that error will be converted to
291-
*error.CustomError. There are other types of errors like error.TimeoutError, error.CanceledError and error.PanicError.
373+
activity returns an error as errors.New() or fmt.Errorf(), those errors will be converted to workflow.GenericError. If the
374+
activity returns an error as workflow.NewCustomError("err-reason", details), that error will be converted to
375+
*workflow.CustomError. There are other types of errors like workflow.TimeoutError, workflow.CanceledError and workflow.PanicError.
292376
So the error handling code would look like:
293377
294378
err := workflow.ExecuteActivity(ctx, YourActivityFunc).Get(ctx, nil)
295379
switch err := err.(type) {
296-
case *error.CustomError:
380+
case *workflow.CustomError:
297381
switch err.Reason() {
298382
case "err-reason-a":
299383
// handle error-reason-a
@@ -305,7 +389,7 @@ So the error handling code would look like:
305389
default:
306390
// handle all other error reasons
307391
}
308-
case *error.GenericError:
392+
case *workflow.GenericError:
309393
switch err.Error() {
310394
case "err-msg-1":
311395
// handle error with message "err-msg-1"
@@ -314,7 +398,7 @@ So the error handling code would look like:
314398
default:
315399
// handle all other generic errors
316400
}
317-
case *error.TimeoutError:
401+
case *workflow.TimeoutError:
318402
switch err.TimeoutType() {
319403
case shared.TimeoutTypeScheduleToStart:
320404
// handle ScheduleToStart timeout
@@ -324,9 +408,9 @@ So the error handling code would look like:
324408
// handle heartbeat timeout
325409
default:
326410
}
327-
case *error.PanicError:
328-
// handle panic error
329-
case *error.CanceledError:
411+
case *workflow.PanicError:
412+
// handle panic error
413+
case *workflow.CanceledError:
330414
// handle canceled error
331415
default:
332416
// all other cases (ideally, this should not happen)
@@ -530,7 +614,7 @@ The code below implements the unit tests for the SimpleWorkflow sample.
530614
s.True(s.env.IsWorkflowCompleted())
531615
532616
s.NotNil(s.env.GetWorkflowError())
533-
_, ok := s.env.GetWorkflowError().(*error.GenericError)
617+
_, ok := s.env.GetWorkflowError().(*workflow.GenericError)
534618
s.True(ok)
535619
s.Equal("SimpleActivityFailure", s.env.GetWorkflowError().Error())
536620
}
@@ -591,7 +675,7 @@ Lets first take a look at a test that simulates a test failing via the "activity
591675
s.True(s.env.IsWorkflowCompleted())
592676
593677
s.NotNil(s.env.GetWorkflowError())
594-
_, ok := s.env.GetWorkflowError().(*error.GenericError)
678+
_, ok := s.env.GetWorkflowError().(*workflow.GenericError)
595679
s.True(ok)
596680
s.Equal("SimpleActivityFailure", s.env.GetWorkflowError().Error())
597681
}

0 commit comments

Comments
 (0)