Skip to content

[TransferManager] Potential OOM when uploading large directories due to unbounded task submission #3196

@QiuYucheng2003

Description

@QiuYucheng2003

Upcoming End-of-Support

  • I acknowledge the upcoming end-of-support for AWS SDK for Java v1 was announced, and migration to AWS SDK for Java v2 is recommended.

Describe the bug

I encountered a potential OutOfMemoryError (OOM) risk in TransferManager when attempting to upload a directory containing a very large number of files (e.g., hundreds of thousands of small files).

Upon analyzing the source code com.amazonaws.services.s3.transfer.TransferManager, I found that the uploadDirectory method eventually calls uploadFileList. This method iterates through all files in the provided list and submits an upload task for each file to the internal executorService immediately, without any backpressure or client-side throttling.

Code Reference: In uploadFileList (approx. line 1386 in standard releases), the code iterates and submits tasks:
// TransferManager.java
for (File f : files) {
if (f.isFile()) {
// ... metadata preparation ...

    // This submits the task to the executor immediately
    uploads.add((UploadImpl) doUpload(
            new PutObjectRequest(bucketName, virtualDirectoryKeyPrefix + key, f)
            // ... args
    ));
}

}
Because the default ExecutorService (initialized via TransferManagerUtils) typically uses an unbounded queue (to avoid deadlocks as mentioned in the Javadoc), the main thread produces tasks much faster than the network can consume them. This causes millions of PutObjectRequest and UploadImpl objects to queue up in the heap, leading to OOM.

Regression Issue

  • Select this option if this issue appears to be a regression.

Expected Behavior

The TransferManager.uploadDirectory method should ideally implement a client-side throttling mechanism (e.g., using a Semaphore) to limit the number of active tasks submitted to the executor.

Alternatively, if a fix is not feasible due to the maintenance status, the documentation (Javadoc) should explicitly warn users that uploadDirectory is not safe for directories containing massive numbers of files and suggest manual batching.

Current Behavior

When uploadDirectory is invoked on a directory with a large number of files (e.g., > 100k files), the application's heap memory usage grows rapidly.

The application eventually crashes with: java.lang.OutOfMemoryError: Java heap space

The stack trace usually points to the internal queue of the ThreadPoolExecutor growing indefinitely or the creation of PutObjectRequest objects inside the loop in uploadFileList.

Reproduction Steps

import com.amazonaws.services.s3.transfer.TransferManager;
import com.amazonaws.services.s3.transfer.TransferManagerBuilder;
import java.io.File;
import java.io.IOException;

public class OOMReproduction {
public static void main(String[] args) throws IOException {
// 1. Prepare a directory with a large number of empty files
File largeDir = new File("large_test_dir");
largeDir.mkdir();
for (int i = 0; i < 500000; i++) {
new File(largeDir, "test_file_" + i).createNewFile();
}

    // 2. Initialize TransferManager (standard configuration)
    TransferManager tm = TransferManagerBuilder.standard().build();

    try {
        // 3. Trigger the unbounded queuing
        // This will likely OOM before uploads significantly progress
        tm.uploadDirectory("my-bucket-name", "prefix", largeDir, true).waitForCompletion();
    } catch (InterruptedException e) {
        e.printStackTrace();
    } finally {
        tm.shutdownNow();
    }
}

}

Possible Solution

Since using a bounded queue for the ExecutorService itself is discouraged (as per source comments warning about deadlocks with control tasks), the recommended fix is to throttle the submission loop in uploadFileList.

Proposed Logic: Use a Semaphore initialized with a reasonable limit (e.g., 10,000). Acquire a permit before calling doUpload inside the loop, and release the permit in a callback when the upload completes.

If a code fix is unlikely, please update the Javadoc to warn users about this limitation.

Additional Information/Context

The Javadoc in TransferManager (around line 262) explicitly advises against bounded work queues:

"It is not recommended to use a single threaded executor or a thread pool with a bounded work queue as control tasks may submit subtasks that can't complete until all sub tasks complete."

This design choice confirms that the unbounded queue is intentional, which necessitates the need for external throttling in recursive operations like uploadDirectory.

AWS Java SDK version used

1.12.770

JDK version used

JDK 1.8

Operating System and version

macOS

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugThis issue is a bug.closing-soonThis issue will close in 2 days unless further comments are made.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions