Skip to content

Commit 5d90692

Browse files
committed
decommission: retry on errors for AllocatorCheckRange
Previously, the decommission pre-check would fail for a range if evalStore.AllocatorCheckRange returned an error. However, transient errors, such as throttled stores, are only expected to last about 5 seconds (FailedReservationsTimeout) and can cause the pre-check to fail. This commit adds a retry loop around AllocatorCheckRange to retry on any errors. Alternatively, we could check for throttling errors specifically and retry only on throttling stores, but that would require string or error comparisons, which complicates the code. So we retry just on all errors here given this only affects the decommission pre-check.
1 parent 8ee050d commit 5d90692

File tree

1 file changed

+17
-1
lines changed

1 file changed

+17
-1
lines changed

pkg/server/decommission.go

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ import (
2525
"github.com/cockroachdb/cockroach/pkg/util/log/logpb"
2626
"github.com/cockroachdb/cockroach/pkg/util/log/severity"
2727
"github.com/cockroachdb/cockroach/pkg/util/rangedesc"
28+
"github.com/cockroachdb/cockroach/pkg/util/retry"
2829
"github.com/cockroachdb/cockroach/pkg/util/syncutil"
2930
"github.com/cockroachdb/cockroach/pkg/util/timeutil"
3031
"github.com/cockroachdb/cockroach/pkg/util/tracing/tracingpb"
@@ -237,7 +238,22 @@ func (s *topLevelServer) DecommissionPreCheck(
237238
continue
238239
}
239240

240-
action, _, recording, rErr := evalStore.AllocatorCheckRange(ctx, &desc, collectTraces, overrideStorePool)
241+
// Retry for transient errors such as stores throttling. Throttled stores
242+
// typically lasts FailedReservationsTimeout (5 seconds by default).
243+
var action allocatorimpl.AllocatorAction
244+
var recording tracingpb.Recording
245+
var rErr error
246+
retryOpts := retry.Options{
247+
InitialBackoff: 2 * time.Second,
248+
MaxBackoff: 5 * time.Second,
249+
MaxRetries: 5,
250+
}
251+
for r := retry.StartWithCtx(ctx, retryOpts); r.Next(); {
252+
action, _, recording, rErr = evalStore.AllocatorCheckRange(ctx, &desc, collectTraces, overrideStorePool)
253+
if rErr == nil {
254+
break
255+
}
256+
}
241257
rangesChecked += 1
242258
actionCounts[action.String()] += 1
243259

0 commit comments

Comments
 (0)