-
Notifications
You must be signed in to change notification settings - Fork 118
Description
Description
When using an aggregator connection with many AWS accounts (270+), STS AssumeRole calls are throttled. Despite configuring max_error_retry_attempts = 25, the error shows "exceeded maximum number of
attempts, 3" - the AWS SDK default.
Root Cause
The retry configuration is applied in getClientWithMaxRetries() for API calls, but NOT in getBaseClientForAccountUncached() where credentials are loaded via config.LoadDefaultConfig().
When profiles require AssumeRole (via role_arn + source_profile), the SDK uses its default retryer (3 attempts) instead of the configured value.
The default of 3 comes from the AWS SDK Go v2:
// aws/retry/standard.go line 26
const DefaultMaxAttempts int = 3
Code Reference
service.go - getBaseClientForAccountUncached() - CURRENT (BUGGY):
cfg, err := config.LoadDefaultConfig(ctx,
config.WithSharedConfigProfile(profile),
config.WithRegion(region),
// ⚠️ No WithRetryer() option passed here - uses SDK default (3 attempts)
)
service.go - getClientWithMaxRetries() - WORKS CORRECTLY:
// Retry config IS applied here, but only for API calls AFTER credentials are loaded
o.Retryer = retry.NewStandard(func(so *retry.StandardOptions) {
so.MaxAttempts = maxRetries // This uses the configured value (e.g., 25)
})
Suggested Fix
Add config.WithRetryer() to config.LoadDefaultConfig() in getBaseClientForAccountUncached():
func getBaseClientForAccountUncached(ctx context.Context, d *plugin.QueryData, clientType string) (*AwsCommonClient, error) {
// ... existing code ...
// Read retry config (same logic as getClientWithMaxRetries)
awsSpcConfig := GetConfig(d.Connection)
maxRetries := 9 // default
if awsSpcConfig.MaxErrorRetryAttempts != nil {
maxRetries = *awsSpcConfig.MaxErrorRetryAttempts
}
if os.Getenv("AWS_MAX_ATTEMPTS") != "" {
maxRetries, _ = strconv.Atoi(os.Getenv("AWS_MAX_ATTEMPTS"))
}
minRetryDelay := 25 * time.Millisecond // default
if awsSpcConfig.MinErrorRetryDelay != nil {
minRetryDelay = time.Duration(*awsSpcConfig.MinErrorRetryDelay) * time.Millisecond
}
// Load credentials WITH custom retryer
cfg, err := config.LoadDefaultConfig(ctx,
config.WithSharedConfigProfile(profile),
config.WithRegion(region),
config.WithRetryer(func() aws.Retryer { // ✅ ADD THIS
return retry.NewStandard(func(o *retry.StandardOptions) {
o.MaxAttempts = maxRetries
o.MaxBackoff = 5 * time.Minute
o.Backoff = NewExponentialJitterBackoff(minRetryDelay)
o.RateLimiter = NoOpRateLimit{}
})
}),
)
// ... existing code ...
}
Expected Behavior
┌─────────────────────────────────┬──────────────────────────┬──────────────────────────────────────┐
│ Step │ Current │ Expected │
├─────────────────────────────────┼──────────────────────────┼──────────────────────────────────────┤
│ Credential loading (AssumeRole) │ 3 attempts (SDK default) │ Uses max_error_retry_attempts config │
├─────────────────────────────────┼──────────────────────────┼──────────────────────────────────────┤
│ API calls (ListFunctions, etc.) │ Uses config ✅ │ Uses config ✅ │
└─────────────────────────────────┴──────────────────────────┴──────────────────────────────────────┘ Environment
- Steampipe: v2.3.6
- Plugin AWS: v1.29.0
- Number of accounts in aggregator: 271
- Configuration:
connection "aws_xxx" {
plugin = "aws"
max_error_retry_attempts = 25
min_error_retry_delay = 200
profile = "xxx"
}
Error Message
operation error STS: AssumeRole, get identity: get credentials: failed to refresh cached credentials,
failed to load credentials, exceeded maximum number of attempts, 3, : You have reached maximum request limit.
Workaround Attempts (None Worked)
- Setting max_error_retry_attempts = 25 in connection config ❌
- Setting AWS_MAX_ATTEMPTS=25 environment variable ❌
- Setting retry_mode = adaptive and max_attempts = 15 in ~/.aws/config ❌
All configurations are ignored because the plugin only applies retry settings to API calls in getClientWithMaxRetries(), not to the initial credential loading via config.LoadDefaultConfig() in
getBaseClientForAccountUncached()