Skip to content

Commit b26eef1

Browse files
committed
rpc: avoid crash in newPeer
It was previously possible to make a new peer while the old one was in the middle of being deleted, which caused a crash due to to acquiring child metrics when they still existed. Luckily, this is easy enough to fix: just remove some premature optimization where I had tried to be too clever. Fixes cockroachdb#105335. Epic: CRDB-21710 Release note: None (bug was never shipped in a release; it's new as of cockroachdb#99191).
1 parent c10ca05 commit b26eef1

File tree

1 file changed

+9
-14
lines changed

1 file changed

+9
-14
lines changed

pkg/rpc/peer.go

Lines changed: 9 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -846,26 +846,21 @@ func (p *peer) maybeDelete(ctx context.Context, now time.Time) {
846846

847847
log.VEventf(ctx, 1, "deleting peer")
848848

849-
// Lock order: map, then peer. But here we can do better and
850-
// not hold both mutexes at the same time.
849+
// Lock order: map, then peer. We need to lock both because we want
850+
// to atomically release the metrics while removing from the map[1][2].
851851
//
852-
// Release metrics in the same critical section as p.deleted=true
853-
// to make sure the metrics are not updated after release, since that
854-
// causes the aggregate metrics to drift.
855-
//
856-
// We delete from the map first, then mark the peer as deleted. The converse
857-
// works too, but it makes for flakier tests because it's possible to see the
858-
// metrics change but the peer still being in the map.
859-
852+
// [1]: see https://github.com/cockroachdb/cockroach/issues/105335
853+
// [2]: Releasing in one critical section with p.deleted=true ensures
854+
// that the metrics are not updated after release, which would
855+
// otherwise cause the aggregate metrics to drift away from zero
856+
// permanently.
860857
p.peers.mu.Lock()
858+
defer p.peers.mu.Unlock()
861859
delete(p.peers.mu.m, p.k)
862-
p.peers.mu.Unlock()
863-
864860
p.mu.Lock()
861+
defer p.mu.Unlock()
865862
p.mu.deleted = true
866863
p.peerMetrics.release()
867-
p.mu.Unlock()
868-
869864
}
870865

871866
func launchConnStateWatcher(

0 commit comments

Comments
 (0)