You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
mvcc: avoid double decrement of watcher gauge on close/cancel race
This occurs specifically when the watch is for a compacted revision, as there's
a possible interleaving of cancel/close that invokes the `cancelWatch` function
twice.
It's fairly difficult to provoke the race condition but it is possible to
observe on `main` the racing test can fail with a negative gauge:
```
$ go test ./... -run TestNewWatcherCountGauge/compacted_watch,_close/cancel_race
--- FAIL: TestNewWatcherCountGauge (0.34s)
watchable_store_test.go:86: # HELP etcd_debugging_mvcc_watcher_total Total number of watchers.
# TYPE etcd_debugging_mvcc_watcher_total gauge
-etcd_debugging_mvcc_watcher_total -1
+etcd_debugging_mvcc_watcher_total 0
FAIL
FAIL go.etcd.io/etcd/server/v3/storage/mvcc 0.830s
? go.etcd.io/etcd/server/v3/storage/mvcc/testutil [no test files]
FAIL
```
It seems as though it is partially expected for the cancel function to be
invoked multiple times and to handle that safely (i.e., the existing `ch == nil`
check) - the bug here is that in the `if/else if` branches it comes "too late",
and multiple invocations where `wa.compacted` is true will both decrement the
counter. Shifting the case up one ensures that we can't follow that decrement
branch multiple times.
In fact, it seems logically more sensible to put this `wa.ch == nil` case
_first_, as a guard for the function being invoked multiple times, but moving i
before the sync/unsynced watch set delete functions could have a greater
inadvertent functional impact (i.e., if we never deleted cancelled watches from
these sets it would presumably introduce a leak), so from an abundance of
caution I've made the smallest change I think will fix my issue.
Signed-off-by: Kieran Gorman <[email protected]>
0 commit comments