Skip to content

MIGRATION ISSUE: Socket exhaustion happening only in v3 using default configs in both v2 and v3 #7416

@marianomerlo

Description

@marianomerlo

Pre-Migration Checklist

Which JavaScript Runtime is this issue in?

Node.js (includes AWS Lambda)

AWS Lambda Usage

  • Yes, my application is running on AWS Lambda.
  • No, my application is not running on AWS Lambda.

Describe the Migration Issue

I was using v2 for almost 10 years in a productive environment without any major issue. We decided to move to v3 and after certain amount of time (48hs - 72hs) the deployment occurs, I start getting socket usage alerts:
@smithy/node-http-handler:WARN - socket usage at capacity=50 and XXXX additional requests are enqueued. and the application stops working properly.

Code Comparison

This is how I am initializing my S3 v2 client:

const v2Config = {
      accessKeyId: ********,
      secretAccessKey: ********,
      region: 'us-east-1',
      endpoint: null
      signatureVersion: 'v4',
      computeChecksums: true,
      s3ForcePathStyle: false
      params: {
        Bucket: 'my-bucket-name'
      }
}

Note: we are actually promisifying v2 client: bluebird.promisifyAll(new awsSdk.S3(v2Config)

In order to make the migration "transparent", I created a function that based on the "v2Config", a "v3Config" is created to initialize the S3 V3 client:

  const v3Config = {
      region: v2Config.region,
      credentials: {
        accessKeyId: v2Config.accessKeyId,
        secretAccessKey: v2Config.secretAccessKey
      },
      endpoint: v2Config.endpoint as string,
      forcePathStyle: !!v2Config.s3ForcePathStyle,
      retryMode: 'standard',
      requestHandler: {
        requestTimeout: v2Config.httpOptions?.timeout ?? 120000,
        // in v2, default is false; in v3, default is true
        // so we need to set it to false be the same as v2
        keepAlive: v2Config.httpOptions?.agent?.keepSocketAlive ?? false
      },
    }
  };
  if (v2Config.computeChecksums) {
    // eslint-disable-next-line no-param-reassign
    v3Config.requestChecksumCalculation = 'WHEN_SUPPORTED';
  }
  // bucket is now provided at each command 
  return v3Config;

If the issue is not in the config, then, from all S3 operations I'm using, I can only think about the getObject and/or upload to be causing this.

Here is how I do them in v2:

  • Get Object V2
    const stream = s3.getObject({ Key: 'my-key' }).createReadStream();
    return stream;
  • Upload in V2
    const params = {
      Key: 'my-key',
      Body: stream,
      ContentType: file.objectType,
      Metadata: encodeMetadata(metadata),
    };

    return s3
      .uploadAsync(params)

Here is how I do them in v3:

  • Get Object V3
    const params: GetObjectCommandInput = {
      Bucket: 'my-bucket-name',
      Key: 'my-key'
    };
    const command = new GetObjectCommand(params);
    const response: GetObjectCommandOutput = await s3.send(command);

    return response.Body as Readable;
  • Upload in V3
      import { Upload } from '@aws-sdk/lib-storage';
      import { S3 } from '@aws-sdk/client-s3';

      const s3 = new S3(v3Config)

      const upload = new Upload({
        client: s3,
        params: {
          Bucket: config.bucketName,
          Key: 'my-key',
          Body: stream,
          ContentType: file.objectType,
          Metadata: encodeMetadata(metadata),
          ChecksumAlgorithm: 'SHA256'
        }
      });

      return await upload.done();

Observed Differences/Errors

As I said before, after 48hs/72hs of the server running, we start getting socket exhaustion warning: @smithy/node-http-handler:WARN - socket usage at capacity=50 and XXXX additional requests are enqueued..

If I restart the server, I get another 48hs/72hs of working time until this happens again. If I switch back to v2, application works without a restart.

Additional Context

According to v2 docs: if you don't set any custom http/https agent config, the maxSocket is defaulted to 50 (which is the same for v3, according to the the original warning we are getting).

So, that's it.

My first obvious question is if there is a way I can re-validate that the "maxSockets" being used when using v2 is actually 50? (Just to double check there is nothing overriding this value and therefore the issue is just I'm limiting the amount of available sockets when using v3 and when using v2 the limit is just higher and that's why it works)

Any help is really appreciated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    needs-triageThis issue or PR still needs to be triaged.v2-v3-inconsistencyBehavior has changed from v2 to v3, or feature is missing altogether

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions