Skip to content

Commit 7f9cf62

Browse files
committed
pkg/cvo/sync_worker_test.go: Add test for panic caused by goroutine synchronization
After the CVO has started and its worker goroutines are starting to work, a panic may occur due to a data race. Originally (at 961873d), the `SyncWorker.Start()` method was exclusively waiting for a signal on the `SyncWorker.notify` channel from the `SyncWorker.Update()` method before attempting the first application of a payload. After the `Start()` method got the signal, its required values were already initialized by the `SyncWorker.Update()` method, such as the `SyncWorker.work` field. However, the 9222fa9 introduced another section of the CVO that can signal the `SyncWorker.notify` channel. The `SyncWorker.NotifyAboutManagedResourceActivity()` method now signals the channel upon an activity on a managed resource. This signal may occur before the `SyncWorker.Update()` has finished. The signal thus starts an attempt of the `SyncWorker.Start()` method to apply a payload before the `SyncWorker.work` field could have been set by the `SyncWorker.Update()` method. This currently results in an "invalid memory address or nil pointer dereference" panic in the `SyncWork.calculateNextFrom()` method because a nil pointer is passed by the `SyncWorker.Start()` method. This panic can be seen in the following logs [1, 2]. Be sure to notice messages such as `Notify the sync worker: Cluster operator etcd changed Degraded from "False" to "True"` caused by the `NotifyAboutManagedResourceActivity` method right before a panic occurs. This commit will add a test to check whether a panic will occur when the notify channel already contains an element at a startup of the Start() method. A desired behavior for the Start() method is not to cause a panic when a notify signal is received before the Update() method has time to finish. The cancellation of the Start() method's context must remain functional. This commit will test this as well. It is possible that a nonempty notify channel at the Start() method's startup will not have time to cause a panic in the implemented test. However, such a case is improbable due to the non-complexity of the underlying code, and a three-seconds timeout should be enough time to cause such a panic. The alternative is to implement a fine-grained status of the sync worker, which would add additional code and complexity. [1] https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.11-e2e-aws-workers-rhel8/1804819848023248896/artifacts/e2e-aws-workers-rhel8/gather-extra/artifacts/pods/openshift-cluster-version_cluster-version-operator-545d8cb5db-v2bf4_cluster-version-operator_previous.log [2] https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.11-e2e-ovirt/1805174927066664960/artifacts/e2e-ovirt/gather-extra/artifacts/pods/openshift-cluster-version_cluster-version-operator-849c49896b-wnn8v_cluster-version-operator_previous.log
1 parent db77771 commit 7f9cf62

File tree

1 file changed

+52
-0
lines changed

1 file changed

+52
-0
lines changed

pkg/cvo/sync_worker_test.go

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -476,3 +476,55 @@ func Test_equalDigest(t *testing.T) {
476476
})
477477
}
478478
}
479+
480+
func Test_SyncWorkerShouldNotPanicDueToNotifySignalAtStartUp(t *testing.T) {
481+
o, _, _, _, shutdownFn := setupCVOTest("testdata/panic")
482+
defer shutdownFn()
483+
syncChannel := make(chan bool)
484+
485+
// Start() should not cause a panic when the notify channel already contains elements at its startup
486+
ctx, cancel := context.WithCancel(context.Background())
487+
worker := o.configSync.(*SyncWorker)
488+
worker.notify <- "Notify the sync worker: Cluster operator A changed versions"
489+
go func() {
490+
worker.Start(ctx, 1)
491+
syncChannel <- true
492+
}()
493+
494+
// A panic must not occur due to the notify signal; wait a reasonable time for a potential panic
495+
<-time.After(3 * time.Second)
496+
497+
// Shut down the sync worker and wait for the confirmation
498+
cancel()
499+
select {
500+
case <-syncChannel:
501+
case <-time.After(3 * time.Second):
502+
t.Fatal("Sync worker did not shut down in time after its context was cancelled")
503+
}
504+
}
505+
506+
func Test_SyncWorkerShouldShutDownImmediatelyAtStartUpWhenContextCancelled(t *testing.T) {
507+
o, _, _, _, shutdownFn := setupCVOTest("testdata/panic")
508+
defer shutdownFn()
509+
syncChannel := make(chan bool)
510+
511+
// Start() shuts down immediately while waiting for its initial signal
512+
// Only the context cancellation signal is received
513+
ctx, cancel := context.WithCancel(context.Background())
514+
worker := o.configSync.(*SyncWorker)
515+
go func() {
516+
worker.Start(ctx, 1)
517+
syncChannel <- true
518+
}()
519+
520+
// A panic must not occur due to the notify signal; wait a reasonable time for a potential panic
521+
<-time.After(3 * time.Second)
522+
523+
// Shut down the sync worker and wait for the confirmation
524+
cancel()
525+
select {
526+
case <-syncChannel:
527+
case <-time.After(3 * time.Second):
528+
t.Fatal("Sync worker did not shut down in time after its context was cancelled")
529+
}
530+
}

0 commit comments

Comments
 (0)