Skip to content

Commit f21e350

Browse files
authored
Restore race-checking tests (#1376)
Fixing the flakiness that led to #1375. The races in these tests were due to `t.Log` calls occurring after the test finishes, because the workflow (and test suite and tests and...) does not wait for goroutines to shut down. It's an annoying enough issue that I tackled it with gusto in cadence-workflow/cadence#6067 and it's probably worth porting over here too. Though the underlying "shut down and do not wait" behavior is still extremely dangerous and needs to be fixed some day.
1 parent 163f1a9 commit f21e350

File tree

1 file changed

+15
-4
lines changed

1 file changed

+15
-4
lines changed

internal/context_test.go

Lines changed: 15 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -70,11 +70,11 @@ func TestContextChildParentCancelRace(t *testing.T) {
7070
}
7171
env.RegisterWorkflow(wf)
7272
env.ExecuteWorkflow(wf)
73+
assert.True(t, env.IsWorkflowCompleted())
7374
assert.NoError(t, env.GetWorkflowError())
7475
}
7576

7677
func TestContextConcurrentCancelRace(t *testing.T) {
77-
t.Skip("This test is racy and not reliable. It is disabled until we can make it reliable.")
7878
/*
7979
A race condition existed due to concurrently ending goroutines on shutdown (i.e. closing their chan without waiting
8080
on them to finish shutdown), which executed... quite a lot of non-concurrency-safe code in a concurrent way. All
@@ -83,8 +83,13 @@ func TestContextConcurrentCancelRace(t *testing.T) {
8383
Context cancellation was one identified by a customer, and it's fairly easy to test.
8484
In principle this must be safe to do - contexts are supposed to be concurrency-safe. Even if ours are not actually
8585
safe (for valid reasons), our execution model needs to ensure they *act* like it's safe.
86+
87+
This intentionally avoids `newTestWorkflowEnv` because the dangling goroutines
88+
may shut down after the test finishes -> produce a log, which fails on t.Log races.
89+
Ideally the test suite would stop goroutines too, but that isn't reliable currently either.
90+
This can be changed if we switch to the server's race-safe zaptest wrapper.
8691
*/
87-
env := newTestWorkflowEnv(t)
92+
env := (&WorkflowTestSuite{}).NewTestWorkflowEnvironment()
8893
wf := func(ctx Context) error {
8994
ctx, cancel := WithCancel(ctx)
9095
racyCancel := func(ctx Context) {
@@ -101,15 +106,20 @@ func TestContextConcurrentCancelRace(t *testing.T) {
101106
}
102107
env.RegisterWorkflow(wf)
103108
env.ExecuteWorkflow(wf)
109+
assert.True(t, env.IsWorkflowCompleted())
104110
assert.NoError(t, env.GetWorkflowError())
105111
}
106112

107113
func TestContextAddChildCancelParentRace(t *testing.T) {
108-
t.Skip("This test is racy and not reliable. It is disabled until we can make it reliable.")
109114
/*
110115
It's apparently also possible to race on adding children while propagating the cancel to children.
116+
117+
This intentionally avoids `newTestWorkflowEnv` because the dangling goroutines
118+
may shut down after the test finishes -> produce a log, which fails on t.Log races.
119+
Ideally the test suite would stop goroutines too, but that isn't reliable currently either.
120+
This can be changed if we switch to the server's race-safe zaptest wrapper.
111121
*/
112-
env := newTestWorkflowEnv(t)
122+
env := (&WorkflowTestSuite{}).NewTestWorkflowEnvironment()
113123
wf := func(ctx Context) error {
114124
ctx, cancel := WithCancel(ctx)
115125
racyCancel := func(ctx Context) {
@@ -131,6 +141,7 @@ func TestContextAddChildCancelParentRace(t *testing.T) {
131141
}
132142
env.RegisterWorkflow(wf)
133143
env.ExecuteWorkflow(wf)
144+
assert.True(t, env.IsWorkflowCompleted())
134145
assert.NoError(t, env.GetWorkflowError())
135146
}
136147

0 commit comments

Comments
 (0)