Skip to content

Commit f8b230b

Browse files
committed
MAYBE fix flaky TestPolicySetsCreate/with_vcs_policy_updated
This flake has been extremely rowdy lately. The flake signature is: - PoliciesPath is zeroed out - After logging that failure, it segfaults on the next line (nil pointer dereference) It turns out PoliciesPath is kind of a red herring — the backend erases that value if the VCSRepo is removed, as documented in the API reference. The real action is in VCSRepo, which comes back as a nil pointer after that update. And, if you modify the test to do an additional Read on the policy set to compare, it agrees with what the Update returned — the repo got nilled out for some reason. I went looking in the backend itself, and it turns out that OAuthClients have some async cleanup behavior on deletion. And the prior subtest that sets up the policy set used by the flaky test cleans up its OAuthClient when it's done, leaving the next subtest to create a new one. My working theory is that there's a race in the backend: if the Update call with the new VCSRepo values manage to slip in before the async cleanup for the old OAuthClient is done, it'll get nilled out instead of creating a new VCSRepo model. IF this guess is right, then we can avoid the flake by not trying to eagerly clean up the first OAuthClient before the rest of the subtests run. We're cleaning up the whole org at the end of the outer test block anyway, so there's no real benefit to being in a hurry, and an org can have multiple OAuthClient instances at once.
1 parent 3b62deb commit f8b230b

File tree

1 file changed

+14
-4
lines changed

1 file changed

+14
-4
lines changed

policy_set_integration_test.go

Lines changed: 14 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -225,8 +225,13 @@ func TestPolicySetsCreate(t *testing.T) {
225225
t.Skip("Export a valid GITHUB_POLICY_SET_IDENTIFIER before running this test")
226226
}
227227

228-
oc, ocTestCleanup := createOAuthToken(t, client, orgTest)
229-
defer ocTestCleanup()
228+
// We are deliberately ignoring the cleanup func here, because there's a potential race condition
229+
// against the subsequent subtest -- TFC performs some async cleanup on VCS repos when deleting an
230+
// OAuthClient, and we've seen evidence that it will zero out the next test's NEW VCSRepo values if
231+
// they manage to slip in before the async stuff completes, even though the new values link it to a
232+
// new OAuthToken. Anyway, there's a deferred cleanup for orgTest in the outer scope, so the org's
233+
// dependent: destroy clause on OAuthClients will clean this up when the test as a whole ends.
234+
oc, _ := createOAuthToken(t, client, orgTest)
230235

231236
options := PolicySetCreateOptions{
232237
Name: String("vcs-policy-set"),
@@ -265,8 +270,13 @@ func TestPolicySetsCreate(t *testing.T) {
265270
t.Skip("Export a valid GITHUB_POLICY_SET_IDENTIFIER before running this test")
266271
}
267272

268-
oc, ocTestCleanup := createOAuthToken(t, client, orgTest)
269-
defer ocTestCleanup()
273+
// We are deliberately ignoring the cleanup func here, because it's not really necessary: there's a
274+
// deferred cleanup for orgTest in the outer scope, so the org's dependent: destroy clause on
275+
// OAuthClients will clean this up when the test as a whole ends. Unlike the one in the previous
276+
// subtest, there's no known race condition here because there aren't any later subtests that modify
277+
// this same policy set. But I'm being consistent with the prior case just to reduce risks from future
278+
// copypasta code.
279+
oc, _ := createOAuthToken(t, client, orgTest)
270280

271281
options := PolicySetUpdateOptions{
272282
Name: String("vcs-policy-set"),

0 commit comments

Comments
 (0)