You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
104365: kvprober: special case node-is-decommissioned errors r=joshimhoff a=joshimhoff
**kvprober: special case node-is-decommissioned errors**
kvprober runs on decommissioned node. In CC, this is generally fine, since
automation fully takes down nodes once they reach the decommissioned state. But
there is a brief period where a node is running and in the decommissioned
state, and we see kvprober errors in metrics during this period, as in below.
This sometimes leads to false positive kvprober pages in CC production.
‹rpc error: code = PermissionDenied desc = n1 was permanently removed from...
To be clear, the errors are not wrong per say. They just are expected to
happen, once a node is decommissioned.
This commit adds special handling for errors of the kind above, by doing a
substring match on the error string. To be exact, kvprober now logs such errors
at warning level and does not increment any error counters. This way, an
operation like decommissioning a node does not cause false positive kvprober
pages in CC production.
Fixescockroachdb#104367
Release note: None, since kvprober is not used by customers. (It is not
documented.)
Co-authored-by: Josh Carp <[email protected]>
Co-authored-by: Josh Imhoff <[email protected]>
0 commit comments