Skip to content

S3Client Got Corrupted on a particular container due to TimeoutError causing S3 writes to fail #6344

@rishi2808-ds

Description

@rishi2808-ds

Checkboxes for prior research

Describe the bug

The issue arises because when the AWS credentials expire, the AWS SDK makes a call to fetch new credentials and cache them using the memoize method.
If this fetch operation fails and results in a TimeoutError, the AWS SDK’s memoize method caches this error. Consequently, subsequent calls retrieve the TimeoutError from the cache instead of attempting to fetch new credentials from AWS.

To reproduce this issue locally, we removed the credentials from the ~/.aws/credentials file, forcing the SDK to fall back to fromInstanceMetadata method for obtaining credentials, mirroring the same behaviour as on remote environment.

We then explicitly threw an error within the AWS SDK and observed that while the first attempt to fetch credentials triggered an API call to the Instance Metadata Service, subsequent attempts retrieved the error from the cache instead of making fresh API calls to Instance Metadata Service.

Below is the screenshot of snapshot of the values of the hasResult and result variables in the memoize method verifying that the TimeoutError is indeed being cached.


image-20240731-194758

Additional logs added.

Screenshot 2024-08-01 at 12 28 34 PM

First time when its get called we can see the added logs.

Screenshot 2024-07-31 at 9 43 12 PM (1)

Subsequent calls do not show added logs in the AWS SDK, indicating that no new API calls are being made. Instead, we continue to see TimeoutError logs, which means the error is being retrieved from the cache.

Screenshot 2024-07-31 at 7 29 08 PM (1)

SDK version number

aws-sdk/[email protected]

Which JavaScript Runtime is this issue in?

Node.js

Details of the browser/Node.js/ReactNative version

v20.10.0

Reproduction Steps

To reproduce issue locally,

Explicitly throw a TimeoutError in the httpRequest function located in node_modules/@aws-sdk/credential-provider-imds/dist/cjs/remoteProvider/httpRequest.js. When this function is called for the first time, it triggers and throws a TimeoutError, which then gets cached in Memoize. On subsequent calls, the function is not invoked again; instead, the cached TimeoutError is returned.

Observed Behavior

We then explicitly threw an error within the AWS SDK and observed that while the first attempt to fetch credentials triggered an API call to the Instance Metadata Service, subsequent attempts retrieved the error from the cache instead of making fresh API calls to Instance Metadata Service.

Expected Behavior

Subsequent calls should call to Instance Metadata Service to fetch credentials when TimeoutError is been stored in cache.

Possible Solution

Subsequent calls should call to Instance Metadata Service to fetch credentials when TimeoutError is been stored in cache.

Additional Information/Context

No response

Metadata

Metadata

Assignees

Labels

bugThis issue is a bug.p2This is a standard priority issue

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions