-
Notifications
You must be signed in to change notification settings - Fork 557
Closed
Labels
Description
The problem occurs in our testing after the update to the latest go-control-plane version, but I believe it might happen in real-world use cases.
Stack trace for goroutines looks like this:
1 @ 0x103ca25 0x100759a 0x10072f5 0x1bb58af 0x1bb4507 0x1bbcdb9 0x1ea2695 0x1bb3ec6 0x1074121
# 0x1bb58ae github.com/kumahq/kuma/pkg/util/xds/v3.(*snapshotCache).respond+0x12e /Users/lobkovilya/go/src/github.com/Kong/kuma/pkg/util/xds/v3/cache.go:311
# 0x1bb4506 github.com/kumahq/kuma/pkg/util/xds/v3.(*snapshotCache).SetSnapshot+0x5a6 /Users/lobkovilya/go/src/github.com/Kong/kuma/pkg/util/xds/v3/cache.go:168
# 0x1bbcdb8 github.com/kumahq/kuma/pkg/kds/reconcile.(*reconciler).Reconcile+0x1f8 /Users/lobkovilya/go/src/github.com/Kong/kuma/pkg/kds/reconcile/reconciler.go:46
# 0x1ea2694 github.com/kumahq/kuma/pkg/kds/server.newSyncTracker.func1.2+0x194 /Users/lobkovilya/go/src/github.com/Kong/kuma/pkg/kds/server/components.go:93
# 0x1bb3ec5 github.com/kumahq/kuma/pkg/util/watchdog.(*SimpleWatchdog).Start+0xe5 /Users/lobkovilya/go/src/github.com/Kong/kuma/pkg/util/watchdog/watchdog.go:25
...
1 @ 0x103ca25 0x104e6c5 0x104e6ae 0x106fd67 0x107f225 0x1080990 0x1080922 0x1bb8a06 0x1bab955 0x1bad89a 0x1e9fecb 0x284e9b3 0x1074121
# 0x106fd66 sync.runtime_SemacquireMutex+0x46 /usr/local/Cellar/go/1.16.5/libexec/src/runtime/sema.go:71
# 0x107f224 sync.(*Mutex).lockSlow+0x104 /usr/local/Cellar/go/1.16.5/libexec/src/sync/mutex.go:138
# 0x108098f sync.(*Mutex).Lock+0x8f /usr/local/Cellar/go/1.16.5/libexec/src/sync/mutex.go:81
# 0x1080921 sync.(*RWMutex).Lock+0x21 /usr/local/Cellar/go/1.16.5/libexec/src/sync/rwmutex.go:111
# 0x1bb8a05 github.com/kumahq/kuma/pkg/util/xds/v3.(*snapshotCache).cancelWatch.func1+0x65 /Users/lobkovilya/go/src/github.com/Kong/kuma/pkg/util/xds/v3/cache.go:283
# 0x1bab954 github.com/envoyproxy/go-control-plane/pkg/server/sotw/v3.(*server).process+0x7b4 /Users/lobkovilya/go/src/github.com/envoyproxy/go-control-plane/pkg/server/sotw/v3/server.go:418
# 0x1bad899 github.com/envoyproxy/go-control-plane/pkg/server/sotw/v3.(*server).StreamHandler+0xb9 /Users/lobkovilya/go/src/github.com/envoyproxy/go-control-plane/pkg/server/sotw/v3/server.go:449
# 0x1e9feca github.com/kumahq/kuma/pkg/kds/server.(*server).StreamKumaResources+0x8a /Users/lobkovilya/go/src/github.com/Kong/kuma/pkg/kds/server/kds.go:30
# 0x284e9b2 github.com/kumahq/kuma/pkg/test/kds/setup.StartServer.func1+0x72 /Users/lobkovilya/go/src/github.com/Kong/kuma/pkg/test/kds/setup/server.go:60
While the first goroutine tries to call SetSnapshot and update all watchers link, the server goroutine receives DiscoveryRequest and tries to call cancel. Both SetSnapshot and cancel call cache.mu.Lock().
The first goroutine in SetSnapshot can't update watchers, because the values.responses channel is full it has capacity 5 so it blocks while the server goroutine won't read something from this channel. But the server goroutine can't read something from values.responses because it's in cancel and waits while cache.mu lock will be unlocked.
cc: @jpeach