-
Notifications
You must be signed in to change notification settings - Fork 734
Description
Acknowledgements
- I have searched (https://github.com/aws/aws-sdk/issues?q=is%3Aissue) for past instances of this issue
- I have verified all of my SDK modules are up-to-date (you can perform a bulk update with
go get -u github.com/aws/aws-sdk-go-v2/...)
Describe the bug
Overview
This library automatically adjusts the signing timestamp when the server clock differs from the client clock and responds with "RequestTimeTooSkewed" error. However when the clocks became synchronized again, the library does not update the offset back to the newly calculated skew (close to zero).
When we shift the client time to the future, we can see that the signing timestamps of the first attempts are in the future, then it adjust the timestamp (using negative offset) to be in the present in the retries. This works when the offset is not close to zero.
However it can't adjust the timestamp for the second try, the proper timestamp is sent only in the third attempt, see later. All subsequent requests will use the correct timestamp at the first attempt. Requests without a "retryer" will be succeeded.
When we shift back the client time to present, it always sends the first request with timestamp in the past (due to negative offset), the second attempt will use the correct timestamp. But all subsequent requests still uses timestamp in past (due to negative timestamp) at the first attempt and adjust it in the second attempt. Requests without "retryer" will fail because of the wrong timestamp.
Observer timeline:
Time on both side: 20250910050607Z
Shift the client time forward by 1 year to 20260910050607Z
Request 1
-> Attempt 1 : 20260910050607Z
<- Response 1 : RequestTimeTooSkewed, 20250910050607Z
-> Attempt 2 : 20260910050607Z
<- Response 2 : RequestTimeTooSkewed, 20250910050607Z
-> Attempt 3 : 20250910050607Z
<- Response 3 : OK
Request 2
-> Attempt 1 : 20260910050607Z
<- Response 1 : OK
Shifting the client time back to present (20250910050607Z)
Request 3
-> Attempt 1 : 20240910050607Z
<- Response 1 : RequestTimeTooSkewed, 20250910050607Z
-> Attempt 2 : 20250910050607Z
<- Response 2 : OK
Request 4 ...
-> Attempt 1 : 20240910050607Z
<- Response 1 : RequestTimeTooSkewed, 20250910050607Z
-> Attempt 2 : 20250910050607Z
<- Response 2 : OK
Analysis
I have checked (and debugged) the source code and I belive the problem is in the retry mechanism in /aws/retry/middleware.go:
// HandleFinalize utilizes the provider Retryer implementation to attempt
// retries over the next handler
func (r *Attempt) HandleFinalize(ctx context.Context, in smithymiddle.FinalizeInput, next smithymiddle.FinalizeHandler) (
out smithymiddle.FinalizeOutput, metadata smithymiddle.Metadata, err error,
) {
var attemptNum int
var attemptClockSkew time.Duration
var attemptResults AttemptResults
maxAttempts := r.retryer.MaxAttempts()
releaseRetryToken := nopRelease
retryMetrics, err := newAttemptMetrics(r.OperationMeter)
if err != nil {
return out, metadata, err
}
for {
attemptNum++
attemptInput := in
attemptInput.Request = r.requestCloner(attemptInput.Request)
// Record the metadata for the for attempt being started.
attemptCtx := setRetryMetadata(ctx, retryMetadata{
AttemptNum: attemptNum,
AttemptTime: sdk.NowTime().UTC(),
MaxAttempts: maxAttempts,
AttemptClockSkew: attemptClockSkew,
})
// Setting clock skew to be used on other context (like signing)
ctx = internalcontext.SetAttemptSkewContext(ctx, attemptClockSkew)
var attemptResult AttemptResult
...
}
First, attemptClockSkew should not be initialized to zero, it should be read from internalcontext:
attemptClockSkew := internalcontext.GetAttemptSkewContext(ctx)
Second, clock skew should be set before creating the attemptCtx, because attemptCtx is used for the request and the offset is read from that context.
// Setting clock skew to be used on other context (like signing)
ctx = internalcontext.SetAttemptSkewContext(ctx, attemptClockSkew)
// Record the metadata for the for attempt being started.
attemptCtx := setRetryMetadata(ctx, retryMetadata{
AttemptNum: attemptNum,
AttemptTime: sdk.NowTime().UTC(),
MaxAttempts: maxAttempts,
AttemptClockSkew: attemptClockSkew,
})
Note: the AttemptClockSkew in retryMetadata is used for adjusting the TTL.
The suggested fix addresses the retry case. I have not verified the non-retry case.
Regression Issue
- Select this option if this issue appears to be a regression.
Expected Behavior
After clocks are synchronized again, the clock-skew offset should be updated with the newly calculated value and should be used at the first attempt.
Current Behavior
After clocks are synchronized again, the client uses wrong timestamps at the first attempt. Requests without retry always fails.
Reproduction Steps
- shift the client time forward by 1 hour
- send requests to the server, first will succeed after retries and the subsequent requests will succeed for the first attempt
- shif the client time back to present
- send requests to the server, they will succeed after retries but never for the first attempt
Easier way is to extend the unit test in /internal/middleware/midleware_test.go:
func TestSkewIsSetOnTheWholeClient(t *testing.T) {
defer resetDefaults(sdk.TestingUseNopSleep())
fiveMinuteSkew := func() time.Time {
return time.Now().Add(5 * time.Minute)
}
sdk.NowTime = fiveMinuteSkew
var offset atomic.Int64
offset.Store(0)
c := MockClient{
options{
HTTPClient: failRequestIfSkewed(),
Retryer: retry.NewStandard(func(s *retry.StandardOptions) {
}),
Offset: &offset,
},
}
resp, err := c.Do(context.Background())
if err != nil {
t.Errorf("Expected request to succeed on retry. Got %v and err %v", resp, err)
}
// Remove retryer so it has to succeed on first call
c.options.Retryer = nil
// same client, new request
resp, err = c.Do(context.Background())
if err != nil {
t.Errorf("Expected second request to succeed since the skew should be set on the client. Got %v and err %v", resp, err)
}
// restore time
sdk.NowTime = func() time.Time {
return time.Now()
}
// restore retryer
c.options.Retryer = retry.NewStandard(func(s *retry.StandardOptions) {
})
// same client, new request
resp, err = c.Do(context.Background())
if err != nil {
t.Errorf("Expected request to succeed on retry. Got %v and err %v", resp, err)
}
// Remove retryer so it has to succeed on first call
c.options.Retryer = nil
// same client, new request
resp, err = c.Do(context.Background())
if err != nil {
t.Errorf("Expected second request to succeed since the skew should be set on the client. Got %v and err %v", resp, err)
}
}
This extended unit test fails with the current code, but pass with the suggested fixes.
Possible Solution
- Read the prior attempt skew from context when starting the loop.
- Set the skew on the context before creating attemptCtx, so all downstream consumers (signing, TTL) see the correct value.
Additional Information/Context
No response
AWS Go SDK V2 Module Versions Used
Compiler and Version used
go1.24.0 darwin/arm64
Operating System and version
mac osx, linux (doesn't matter)