Skip to content

Clock-skew offset is not updated after time re-sync, first attempts always use wrong timestampsΒ #3187

@eattker

Description

@eattker

Acknowledgements

Describe the bug

Overview

This library automatically adjusts the signing timestamp when the server clock differs from the client clock and responds with "RequestTimeTooSkewed" error. However when the clocks became synchronized again, the library does not update the offset back to the newly calculated skew (close to zero).

When we shift the client time to the future, we can see that the signing timestamps of the first attempts are in the future, then it adjust the timestamp (using negative offset) to be in the present in the retries. This works when the offset is not close to zero.
However it can't adjust the timestamp for the second try, the proper timestamp is sent only in the third attempt, see later. All subsequent requests will use the correct timestamp at the first attempt. Requests without a "retryer" will be succeeded.

When we shift back the client time to present, it always sends the first request with timestamp in the past (due to negative offset), the second attempt will use the correct timestamp. But all subsequent requests still uses timestamp in past (due to negative timestamp) at the first attempt and adjust it in the second attempt. Requests without "retryer" will fail because of the wrong timestamp.

Observer timeline:

Time on both side: 20250910050607Z
Shift the client time forward by 1 year to 20260910050607Z

Request 1
-> Attempt 1 : 20260910050607Z
<- Response 1 : RequestTimeTooSkewed, 20250910050607Z
-> Attempt 2 : 20260910050607Z
<- Response 2 : RequestTimeTooSkewed, 20250910050607Z
-> Attempt 3 : 20250910050607Z
<- Response 3 : OK

Request 2
-> Attempt 1 : 20260910050607Z
<- Response 1 : OK

Shifting the client time back to present (20250910050607Z)

Request 3
-> Attempt 1 : 20240910050607Z
<- Response 1 : RequestTimeTooSkewed, 20250910050607Z
-> Attempt 2 : 20250910050607Z
<- Response 2 : OK

Request 4 ...
-> Attempt 1 : 20240910050607Z
<- Response 1 : RequestTimeTooSkewed, 20250910050607Z
-> Attempt 2 : 20250910050607Z
<- Response 2 : OK

Analysis

I have checked (and debugged) the source code and I belive the problem is in the retry mechanism in /aws/retry/middleware.go:

// HandleFinalize utilizes the provider Retryer implementation to attempt
// retries over the next handler
func (r *Attempt) HandleFinalize(ctx context.Context, in smithymiddle.FinalizeInput, next smithymiddle.FinalizeHandler) (
	out smithymiddle.FinalizeOutput, metadata smithymiddle.Metadata, err error,
) {
	var attemptNum int
	var attemptClockSkew time.Duration
	var attemptResults AttemptResults

	maxAttempts := r.retryer.MaxAttempts()
	releaseRetryToken := nopRelease

	retryMetrics, err := newAttemptMetrics(r.OperationMeter)
	if err != nil {
		return out, metadata, err
	}

	for {
		attemptNum++
		attemptInput := in
		attemptInput.Request = r.requestCloner(attemptInput.Request)

		// Record the metadata for the for attempt being started.
		attemptCtx := setRetryMetadata(ctx, retryMetadata{
			AttemptNum:       attemptNum,
			AttemptTime:      sdk.NowTime().UTC(),
			MaxAttempts:      maxAttempts,
			AttemptClockSkew: attemptClockSkew,
		})

		// Setting clock skew to be used on other context (like signing)
		ctx = internalcontext.SetAttemptSkewContext(ctx, attemptClockSkew)

		var attemptResult AttemptResult
                 ...
}

First, attemptClockSkew should not be initialized to zero, it should be read from internalcontext:

        attemptClockSkew := internalcontext.GetAttemptSkewContext(ctx)

Second, clock skew should be set before creating the attemptCtx, because attemptCtx is used for the request and the offset is read from that context.

                // Setting clock skew to be used on other context (like signing)
                ctx = internalcontext.SetAttemptSkewContext(ctx, attemptClockSkew)

		// Record the metadata for the for attempt being started.
		attemptCtx := setRetryMetadata(ctx, retryMetadata{
			AttemptNum:       attemptNum,
			AttemptTime:      sdk.NowTime().UTC(),
			MaxAttempts:      maxAttempts,
			AttemptClockSkew: attemptClockSkew,
		})

Note: the AttemptClockSkew in retryMetadata is used for adjusting the TTL.

The suggested fix addresses the retry case. I have not verified the non-retry case.

Regression Issue

  • Select this option if this issue appears to be a regression.

Expected Behavior

After clocks are synchronized again, the clock-skew offset should be updated with the newly calculated value and should be used at the first attempt.

Current Behavior

After clocks are synchronized again, the client uses wrong timestamps at the first attempt. Requests without retry always fails.

Reproduction Steps

  1. shift the client time forward by 1 hour
  2. send requests to the server, first will succeed after retries and the subsequent requests will succeed for the first attempt
  3. shif the client time back to present
  4. send requests to the server, they will succeed after retries but never for the first attempt

Easier way is to extend the unit test in /internal/middleware/midleware_test.go:

func TestSkewIsSetOnTheWholeClient(t *testing.T) {
	defer resetDefaults(sdk.TestingUseNopSleep())
	fiveMinuteSkew := func() time.Time {
		return time.Now().Add(5 * time.Minute)
	}
	sdk.NowTime = fiveMinuteSkew
	var offset atomic.Int64
	offset.Store(0)
	c := MockClient{
		options{
			HTTPClient: failRequestIfSkewed(),
			Retryer: retry.NewStandard(func(s *retry.StandardOptions) {
			}),
			Offset: &offset,
		},
	}
	resp, err := c.Do(context.Background())
	if err != nil {
		t.Errorf("Expected request to succeed on retry. Got %v and err %v", resp, err)
	}
	// Remove retryer so it has to succeed on first call
	c.options.Retryer = nil
	// same client, new request
	resp, err = c.Do(context.Background())
	if err != nil {
		t.Errorf("Expected second request to succeed since the skew should be set on the client. Got %v and err %v", resp, err)
	}

	// restore time
	sdk.NowTime = func() time.Time {
		return time.Now()
	}
	// restore retryer
	c.options.Retryer = retry.NewStandard(func(s *retry.StandardOptions) {
	})

	// same client, new request
	resp, err = c.Do(context.Background())
	if err != nil {
		t.Errorf("Expected request to succeed on retry. Got %v and err %v", resp, err)
	}
	// Remove retryer so it has to succeed on first call
	c.options.Retryer = nil
	// same client, new request
	resp, err = c.Do(context.Background())
	if err != nil {
		t.Errorf("Expected second request to succeed since the skew should be set on the client. Got %v and err %v", resp, err)
	}
}

This extended unit test fails with the current code, but pass with the suggested fixes.

Possible Solution

  1. Read the prior attempt skew from context when starting the loop.
  2. Set the skew on the context before creating attemptCtx, so all downstream consumers (signing, TTL) see the correct value.

Additional Information/Context

No response

AWS Go SDK V2 Module Versions Used

[email protected]

Compiler and Version used

go1.24.0 darwin/arm64

Operating System and version

mac osx, linux (doesn't matter)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugThis issue is a bug.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions