Skip to content

Commit b299a3b

Browse files
committed
azure: configure a default TryTimeout of 60 seconds
We have seen some evidence of stuck or slow requests to azure blob storage. Setting a try timeout allows us to retry these slow operations and may allow us to make forward progress if the underlying error is transient. Operations that run into the TryTimeout will be internally retried by the cloud package. The TryTimeout can be controlled via the cloudstorage.azure.try.timeout setting. Setting it to zero disables the per-attempt timeout. 60 seconds is a relatively long timeout. The main reason it is set that long is the timeout is applied to read operations and CRDB sometimes performs long lived stream reads when it is merging many SSTs during a restore. Release note: Add a default TryTimeout of 60 seconds for Azure Blob Storage to mitigate occasional stuck operations. Fixes: #154085
1 parent 0eccabf commit b299a3b

File tree

1 file changed

+12
-0
lines changed

1 file changed

+12
-0
lines changed

pkg/cloud/azure/azure_storage.go

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ import (
1212
"net/url"
1313
"path"
1414
"strings"
15+
"time"
1516

1617
"github.com/Azure/azure-sdk-for-go/sdk/azcore"
1718
"github.com/Azure/azure-sdk-for-go/sdk/azidentity"
@@ -48,6 +49,12 @@ var maxRetries = settings.RegisterIntSetting(
4849
"the maximum number of retries per Azure operation",
4950
10)
5051

52+
var tryTimeout = settings.RegisterDurationSetting(
53+
settings.ApplicationLevel,
54+
"cloudstorage.azure.try.timeout",
55+
"the timeout for individual retry attempts in Azure operations",
56+
60*time.Second)
57+
5158
// A note on Azure authentication:
5259
//
5360
// The standardized way to authenticate a third-party identity to the Azure
@@ -242,6 +249,11 @@ func makeAzureStorage(
242249
// Azure SDK defaults to 3 retries, which is too low to survive the 30 second
243250
// brownout in TestAzureFaultInjection.
244251
opts.Retry.MaxRetries = int32(maxRetries.Get(&args.Settings.SV))
252+
// We occasionally see individual requests get stuck for 10+ minutes. If the
253+
// source of the stuckness is transient or applies to individual
254+
// connections/requests, then starting a new request after a timeout may
255+
// succeed and allow the client to make forward progress.
256+
opts.Retry.TryTimeout = tryTimeout.Get(&args.Settings.SV)
245257

246258
var azClient *service.Client
247259
switch conf.Auth {

0 commit comments

Comments
 (0)