Skip to content

Unable to fetch files from S3 #1200

@pkasravi

Description

@pkasravi

Describe the bug

I am trying to read many small files from S3 in parallel but am experiencing timeout errors. My timeout configuration consists of only setting operation timeout at 25 seconds. This is more than enough time to complete my workload (explanation below). The “timed out” requests never reach S3 as the keys are not present in the server side logs. It seems the SDK is throttling requests causing them to time out before they’re executed.

Regression Issue

  • Select this option if this issue appears to be a regression.

Expected Behavior

My workload is 2369 files where each file is <=8MB. My environment consists of a p4de EC2 instance in the same account as the bucket being accessed.

Reading a single 8MB file should take <=1ms. A single NIC on a p4de achieves 100 Gbps throughput, this means:
8MB * 8 / 1000 = 0.064Gb
0.064Gb / 100Gbps = 0.00064s = 0.64ms = ~1ms

Furthermore, a p4de has 96 cores and tokio-rs is configured to use all cores. This means only 96 requests can be processed at a time. This means theoretically in 25ms I should be able to complete all 2369 requests.
2369 tasks / 96 cores * 1ms = 24.677ms = ~25ms.

Using an operation timeout of 25ms AND 25s both give timeout errors. I expect 25s to be more than enough to not experience any timeouts ever.

Current Behavior

Error from application layer

RuntimeError: Failed to fetch. Bucket: XXXX Key: XXXX

Caused by:
    0: dispatch failure
    1: other
    2: an error occurred while loading credentials
    3: an unexpected error occurred communicating with IMDS
    4: error trying to connect: HTTP connect timeout occurred after 1s
    5: HTTP connect timeout occurred after 1s
    6: timed out

From debug logs, nothing stands out prior to the error. Except for getting a lot of these

DEBUG aws_smithy_runtime::client::http::body::minimum_throughput::http_body_0_4_x: current throughput: 0 B/s is below mini
mum: 1 B/s

Reproduction Steps

Client initialization

let timeout_config = TimeoutConfig::builder()
    .operation_timeout(Duration::from_secs(25))
    .build();

let config = aws_config::from_env()
    .region(aws_config::Region::new("us-west-2"))
    .timeout_config(timeout_config)
    .load()
    .await;
if cfg!(debug_assertions) {
    tracing_subscriber::fmt::init();
}
aws_sdk_s3::Client::new(&config)

Reading function

pub async fn read_from_s3(client: &Client, bucket: &str, key: &str) -> Result<Vec<u8>> {
    let mut res = client
        .get_object()
        .bucket(bucket.to_string())
        .key(key)
        .send()
        .await
        .with_context(|| format!("Failed to fetch. Bucket: {} Key: {}", bucket, key))?;
    let mut body_bytes = Vec::new();
    while let Some(chunk) = res.body.next().await {
        body_bytes.extend_from_slice(&chunk.unwrap());
    }
    Ok(body_bytes)
}

Driver code

let mut tasks = vec![];                                                                                                                       
for filename in tasks {
    tasks.push(tokio::spawn(async move {
        let buffer: Vec<u8> = read_from_s3(&client_clone, &bucket_clone, &filename).await?;
    }));
}
let results = join_all(tasks).await;

Possible Solution

No response

Additional Information/Context

No response

Version

aws-sdk-s3 v1.47.0
aws-config v1.5.5

Environment details (OS name and version, etc.)

AL2 5.10.214-202.855.amzn2.x86_64

Logs

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugThis issue is a bug.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions