Skip to content

fix(s3): respect prefix parameter and add pagination for getKeys()#760

Open
guoyangzhen wants to merge 3 commits intounjs:mainfrom
guoyangzhen:fix/s3-prefix-pagination
Open

fix(s3): respect prefix parameter and add pagination for getKeys()#760
guoyangzhen wants to merge 3 commits intounjs:mainfrom
guoyangzhen:fix/s3-prefix-pagination

Conversation

@guoyangzhen
Copy link

@guoyangzhen guoyangzhen commented Mar 20, 2026

Problem

Two bugs in the S3 driver:

Bug 1: prefix parameter ignored

accepts a prefix parameter but never uses it in the fetch URL. When is called, it returns ALL keys in the bucket instead of only keys matching the prefix.

The function even logs the intended URL () but the actual fetch always uses without any query parameters.

Bug 2: no pagination

S3 ListObjectsV2 API returns max 1000 objects per response. Buckets with >1000 objects only return the first 1000 keys, silently dropping the rest.

Fix

  1. Prefix: Build the ListObjectsV2 URL with query parameters
  2. Pagination: Loop using from the XML response until is false
  3. Renamed to to return both keys and pagination metadata

This fix also benefits which internally calls .

Summary by CodeRabbit

  • Bug Fixes

    • S3 object listing now fully supports pagination and correctly retrieves all objects across multiple pages.
    • XML responses and pagination tokens are handled more robustly to avoid truncation and decoding errors.
  • New Features

    • Bulk delete operations are more reliable and efficient: large deletes are chunked and individual deletes use bounded concurrency to reduce failures and resource spikes.

…njs#728)

Two bugs fixed:
1. listObjects() accepted a prefix parameter but never used it in the S3
   ListObjectsV2 API call. Now appends ?list-type=2&prefix=... to URL.
2. S3 returns max 1000 objects per response. Added pagination loop
   using NextContinuationToken to fetch all matching keys.
@guoyangzhen guoyangzhen requested a review from pi0 as a code owner March 20, 2026 06:49
@coderabbitai
Copy link

coderabbitai bot commented Mar 20, 2026

📝 Walkthrough

Walkthrough

Updated S3 driver to use ListObjectsV2 pagination (list-type=2) with prefix and continuation-token handling, robust XML decoding/response parsing, and iterative aggregation of keys; also changed delete logic to batch deletes (1000/key bulk) and limit concurrent single deletes to 10.

Changes

Cohort / File(s) Summary
S3 driver (listing & deletion)
src/drivers/s3.ts
Replaced single-request listing with ListObjectsV2 pagination (list-type=2) and looped accumulation of <Key> values. Added decodeXmlText and parseListResponse to extract keys, IsTruncated, and NextContinuationToken (with validation). Changed delete behavior: bulk deletes split into 1000-key batches per ?delete POST; non-bulk deletes run with bounded concurrency (10 concurrent requests).

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant S3Driver
  participant S3 as "AWS S3"
  participant XML as "XML Parser"

  Client->>S3Driver: listObjects(prefix?)
  S3Driver->>S3: GET /?list-type=2&prefix=&continuation-token? (page 1)
  S3->>S3Driver: XML response (Keys, IsTruncated, NextContinuationToken)
  S3Driver->>XML: parseListResponse(response)
  XML-->>S3Driver: keys[], isTruncated, nextContinuationToken
  alt isTruncated == true
    loop fetch next pages
      S3Driver->>S3: GET /?list-type=2&continuation-token=nextContinuationToken (next page)
      S3->>S3Driver: XML response (more Keys, IsTruncated, NextContinuationToken)
      S3Driver->>XML: parseListResponse(...)
      XML-->>S3Driver: more keys..., nextContinuationToken...
    end
  end
  S3Driver-->>Client: aggregated keys[] or null

  Note over S3Driver,S3: Delete flow (batched/concurrent)
  Client->>S3Driver: delete(keys...)
  alt bulk delete (>=2 keys)
    S3Driver->>S3: POST /?delete (chunks of up to 1000 keys)
    S3->>S3Driver: delete response(s)
  else single deletes
    S3Driver->>S3: DELETE /key (up to 10 concurrent requests)
    S3->>S3Driver: delete response(s)
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

Poem

🐰 Hop, hop — through pages I bound,

Tokens and keys in a merry round,
I batch my deletes, and gather each key,
XML whispers decoded for me. 🥕✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main changes: fixing the prefix parameter handling and adding pagination to getKeys(), which are the primary bugs addressed in the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

You can enable review details to help with troubleshooting, context usage and more.

Enable the reviews.review_details setting to include review details such as the model used, the time taken for each step and more in the review comments.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/drivers/s3.ts`:
- Around line 152-155: In the S3 pagination loop (the code that updates
continuationToken using result.isTruncated and result.nextContinuationToken,
e.g., the list/pagination function in src/drivers/s3.ts), add a fail-fast check:
if result.isTruncated is true but result.nextContinuationToken is
null/undefined, throw an error (or reject) with a clear message (include
bucket/prefix context if available) instead of silently breaking the do/while
loop; keep the existing assignment of continuationToken for the normal case
(result.isTruncated ? result.nextContinuationToken : undefined) but perform the
validation immediately after receiving result and before the loop conditional is
evaluated.
- Around line 282-297: The parsed NextContinuationToken and Key values from
listBucketResult are XML-escaped; add a helper (e.g., decodeXmlText) that
replaces &lt;, &gt;, &quot;, &apos;, &amp; with their characters and apply it to
the extracted nextContinuationToken (variable nextContinuationToken) and each
key when building keys (variable keys produced from Contents matches) so the
returned keys and continuation token are decoded before being returned.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 006d0301-576a-4673-9343-d26754c41391

📥 Commits

Reviewing files that changed from the base of the PR and between fb7b8aa and c2ea3da.

📒 Files selected for processing (1)
  • src/drivers/s3.ts

…esponse values

- Throw error when IsTruncated=true but NextContinuationToken is missing
- Decode XML entities (&amp;lt;, &amp;gt;, &amp;quot;, &amp;apos;, &amp;amp;) in keys and continuation tokens

Addresses CodeRabbit review feedback.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/drivers/s3.ts`:
- Around line 129-157: The listing code (listObjects) can return >1000 keys but
clear()/deleteObjects() currently sends all keys in one DeleteObjects call or
fires unbounded Promise.all; update delete logic to chunk the keys into
<=1000-item batches when calling the S3 DeleteObjects API (keep the existing
bulk DeleteObjects path but loop over batches) and, for the per-object fallback
path, replace the unbounded Promise.all with bounded concurrency (e.g., a simple
worker pool or p-limit) to limit parallel deletes; change the implementation
inside clear() / deleteObjects() (where DeleteObjects and individual deletes are
performed) to iterate over batches and await each batch (or run each batch with
controlled concurrency) so no single request exceeds 1000 keys and provider load
is bounded.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 89810ecc-7344-4751-a045-1bf4e17dad20

📥 Commits

Reviewing files that changed from the base of the PR and between c2ea3da and e29b17f.

📒 Files selected for processing (1)
  • src/drivers/s3.ts

@guoyangzhen
Copy link
Author

All 3 CodeRabbit suggestions addressed:

  1. Fail-fast on truncated missing token — Added validation in parseListResponse (commit e29b17f)
  2. XML entity decoding — Added decodeXmlText() for keys and continuation tokens (commit e29b17f)
  3. Batch DeleteObjects for 1000-key limit — Chunked bulk deletes into 1000-key batches, bounded per-object fallback to 10 concurrent deletes (commit db8e897)

The pagination change means listObjects can now return arbitrarily large key sets. S3's DeleteObjects API has a hard limit of 1000 keys per request, so bulk deletes must be chunked. For the bulkDelete: false fallback path, unbounded Promise.all is replaced with sequential batches of 10 to avoid overwhelming the provider.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/drivers/s3.ts`:
- Around line 209-219: The current bulk-delete loop in clear() sends delete
requests with deleteKeysReq(chunk) and assumes a 200 means success; instead
capture the response from awsFetch(`${baseURL}?delete`, ...) into a variable,
read its text, parse the XML using the existing
parseListResponse()/decodeXmlText() pattern, inspect the parsed document for any
<Error> entries for individual keys, and if any errors exist throw (or return) a
clear, descriptive error containing the key(s) and their error messages; update
the loop to only continue when no per-key errors are found and include
references to deleteKeysReq, awsFetch, parseListResponse, decodeXmlText, and
MAX_BULK_DELETE so you modify the correct block.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e254d9d4-6615-4b36-a560-f761a5b58cf7

📥 Commits

Reviewing files that changed from the base of the PR and between e29b17f and db8e897.

📒 Files selected for processing (1)
  • src/drivers/s3.ts

Comment on lines +209 to +219
// Chunk into batches of MAX_BULK_DELETE for S3 API limit
for (let i = 0; i < keys.length; i += MAX_BULK_DELETE) {
const chunk = keys.slice(i, i + MAX_BULK_DELETE);
const body = deleteKeysReq(chunk);
await awsFetch(`${baseURL}?delete`, {
method: "POST",
headers: {
"x-amz-checksum-sha256": await sha256Base64(body),
},
body,
});
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's examine the context around the code in question
wc -l src/drivers/s3.ts

Repository: unjs/unstorage

Length of output: 79


🏁 Script executed:

# Read the relevant section and surrounding context
cat -n src/drivers/s3.ts | head -250 | tail -100

Repository: unjs/unstorage

Length of output: 3861


🏁 Script executed:

# Search for the deleteKeysReq function to understand the request format
rg -A 10 "deleteKeysReq" src/drivers/s3.ts

Repository: unjs/unstorage

Length of output: 653


🏁 Script executed:

# Search for awsFetch definition and how it's used
rg -B 5 -A 5 "awsFetch" src/drivers/s3.ts | head -50

Repository: unjs/unstorage

Length of output: 1525


🏁 Script executed:

# Look for any existing response handling patterns in the file
rg "\.text\(\)|\.json\(\)|parseXML|deleteErrors" src/drivers/s3.ts

Repository: unjs/unstorage

Length of output: 284


🏁 Script executed:

# Search for XML parsing functions and response handling patterns
rg "parseListResponse|parseDeleteResponse|parseXML|<Error>|<Deleted>" src/drivers/s3.ts

Repository: unjs/unstorage

Length of output: 145


🏁 Script executed:

# Search for decodeXmlText function mentioned in the fix
rg "decodeXmlText|decodeXml" src/drivers/s3.ts

Repository: unjs/unstorage

Length of output: 194


🏁 Script executed:

# Get more context on parseListResponse to understand the pattern
rg -B 5 -A 15 "parseListResponse" src/drivers/s3.ts

Repository: unjs/unstorage

Length of output: 1090


🏁 Script executed:

# Check if there's any error handling import or utility
rg "createError" src/drivers/s3.ts | head -5

Repository: unjs/unstorage

Length of output: 164


🏁 Script executed:

# Get the full parseListResponse implementation to understand the parsing pattern
rg -A 40 "^function parseListResponse" src/drivers/s3.ts

Repository: unjs/unstorage

Length of output: 1240


🏁 Script executed:

# Look for any existing DeleteObjects response handling
rg -B 5 -A 15 "DeleteObjects|deleteObjects" src/drivers/s3.ts | head -80

Repository: unjs/unstorage

Length of output: 2229


🏁 Script executed:

# Check the full awsFetch implementation to understand error handling
rg -B 5 -A 20 "const awsFetch = " src/drivers/s3.ts

Repository: unjs/unstorage

Length of output: 856


🏁 Script executed:

# Check if there's any response parsing in the bulk delete elsewhere
rg -C 10 "bulkDelete" src/drivers/s3.ts

Repository: unjs/unstorage

Length of output: 1533


🏁 Script executed:

# Look for any tests that might show expected behavior for bulk delete errors
find . -name "*.test.*" -o -name "*.spec.*" | head -10

Repository: unjs/unstorage

Length of output: 472


🏁 Script executed:

# Verify the actual AWS DeleteObjects response format in comments or docs
rg "DeleteResult|<Error>" src/

Repository: unjs/unstorage

Length of output: 40


🏁 Script executed:

# Find S3 test files
find . -name "*s3*test*" -o -name "*s3*spec*"

Repository: unjs/unstorage

Length of output: 83


🏁 Script executed:

# Look for common test patterns for S3 driver
fd "s3" test/

Repository: unjs/unstorage

Length of output: 81


🏁 Script executed:

# Check if there's a test for S3 driver
find test -name "*.ts" | xargs grep -l "s3\|S3Driver" 2>/dev/null | head -5

Repository: unjs/unstorage

Length of output: 189


🏁 Script executed:

# Examine the S3 test file for bulk delete test coverage
cat -n test/drivers/s3.test.ts | head -150

Repository: unjs/unstorage

Length of output: 3698


🏁 Script executed:

# Search for any bulk delete or DeleteObjects tests
rg "clear\|bulk|DeleteObjects" test/drivers/s3.test.ts

Repository: unjs/unstorage

Length of output: 40


🌐 Web query:

AWS S3 DeleteObjects API response format error handling per key

💡 Result:

The AWS S3 DeleteObjects API returns an HTTP 200 response with XML in the body containing per-key results. The response format includes: - Deleted array: For each successfully deleted object, contains Key, VersionId (if versioned), DeleteMarker (boolean), and DeleteMarkerVersionId (if applicable). - Errors array: For each failed deletion, contains Key, VersionId (if provided), Code (error code like "NoSuchKey", "AccessDenied"), and Message (error description). Two response modes exist: - Verbose (default): Returns both Deleted and Errors arrays with results for all keys. - Quiet (set Quiet=true in request): Returns only the Errors array; successful deletions are not listed in the response body. Error handling per key: Individual key failures do not fail the entire request. Check the Errors array for specific failures. Always inspect the response, as SDKs may not throw exceptions for partial failures (e.g., Java SDK returns DeleteObjectsResponse with getErrors list; Boto3 returns dict with 'Errors' key). Example response syntax (verbose): example-object QUpfdndhfd8438mnLLU-TXJh0937jS error-object NoSuchKey The specified key does not exist. null Sources confirm this behavior is unchanged as of latest docs (no date conflicts with 2026-03-21).

Citations:


Parse S3 DeleteObjects responses to detect per-key failures.

The chunking is good, but this code treats any 200 OK as success. S3's DeleteObjects API can return per-key <Error> entries in a successful response, so clear() can silently leave objects behind. Capture and parse each response to check for errors before continuing.

The codebase already has the building blocks: parseListResponse() and decodeXmlText() show the pattern for parsing S3 XML responses. Apply the same approach here—capture the response, check for <Error> entries, and throw if any are found.

Example fix
-        await awsFetch(`${baseURL}?delete`, {
+        const res = await awsFetch(`${baseURL}?delete`, {
           method: "POST",
           headers: {
             "x-amz-checksum-sha256": await sha256Base64(body),
           },
           body,
         });
+        const xml = await res!.text();
+        const deleteErrors = [...xml.matchAll(
+          /<Error>[\s\S]*?<Key>([\s\S]*?)<\/Key>[\s\S]*?<Code>([\s\S]*?)<\/Code>/g,
+        )];
+        if (deleteErrors.length > 0) {
+          const [_, key, code] = deleteErrors[0]!;
+          throw createError(
+            DRIVER_NAME,
+            `Failed to delete "${decodeXmlText(key)}" (${code})`,
+          );
+        }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/drivers/s3.ts` around lines 209 - 219, The current bulk-delete loop in
clear() sends delete requests with deleteKeysReq(chunk) and assumes a 200 means
success; instead capture the response from awsFetch(`${baseURL}?delete`, ...)
into a variable, read its text, parse the XML using the existing
parseListResponse()/decodeXmlText() pattern, inspect the parsed document for any
<Error> entries for individual keys, and if any errors exist throw (or return) a
clear, descriptive error containing the key(s) and their error messages; update
the loop to only continue when no per-key errors are found and include
references to deleteKeysReq, awsFetch, parseListResponse, decodeXmlText, and
MAX_BULK_DELETE so you modify the correct block.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant