fix(s3): respect prefix parameter and add pagination for getKeys() by guoyangzhen · Pull Request #760 · unjs/unstorage

guoyangzhen · 2026-03-20T06:49:52Z

Problem

Two bugs in the S3 driver:

Bug 1: prefix parameter ignored

accepts a prefix parameter but never uses it in the fetch URL. When is called, it returns ALL keys in the bucket instead of only keys matching the prefix.

The function even logs the intended URL () but the actual fetch always uses without any query parameters.

Bug 2: no pagination

S3 ListObjectsV2 API returns max 1000 objects per response. Buckets with >1000 objects only return the first 1000 keys, silently dropping the rest.

Fix

Prefix: Build the ListObjectsV2 URL with query parameters
Pagination: Loop using from the XML response until is false
Renamed to to return both keys and pagination metadata

This fix also benefits which internally calls .

Summary by CodeRabbit

Bug Fixes
- S3 object listing now fully supports pagination and correctly retrieves all objects across multiple pages.
- XML responses and pagination tokens are handled more robustly to avoid truncation and decoding errors.
New Features
- Bulk delete operations are more reliable and efficient: large deletes are chunked and individual deletes use bounded concurrency to reduce failures and resource spikes.

…njs#728) Two bugs fixed: 1. listObjects() accepted a prefix parameter but never used it in the S3 ListObjectsV2 API call. Now appends ?list-type=2&prefix=... to URL. 2. S3 returns max 1000 objects per response. Added pagination loop using NextContinuationToken to fetch all matching keys.

coderabbitai · 2026-03-20T06:50:13Z

📝 Walkthrough

Walkthrough

Updated S3 driver to use ListObjectsV2 pagination (list-type=2) with prefix and continuation-token handling, robust XML decoding/response parsing, and iterative aggregation of keys; also changed delete logic to batch deletes (1000/key bulk) and limit concurrent single deletes to 10.

Changes

Cohort / File(s)	Summary
S3 driver (listing & deletion) `src/drivers/s3.ts`	Replaced single-request listing with ListObjectsV2 pagination (`list-type=2`) and looped accumulation of `<Key>` values. Added `decodeXmlText` and `parseListResponse` to extract `keys`, `IsTruncated`, and `NextContinuationToken` (with validation). Changed delete behavior: bulk deletes split into 1000-key batches per `?delete` POST; non-bulk deletes run with bounded concurrency (10 concurrent requests).

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant S3Driver
  participant S3 as "AWS S3"
  participant XML as "XML Parser"

  Client->>S3Driver: listObjects(prefix?)
  S3Driver->>S3: GET /?list-type=2&prefix=&continuation-token? (page 1)
  S3->>S3Driver: XML response (Keys, IsTruncated, NextContinuationToken)
  S3Driver->>XML: parseListResponse(response)
  XML-->>S3Driver: keys[], isTruncated, nextContinuationToken
  alt isTruncated == true
    loop fetch next pages
      S3Driver->>S3: GET /?list-type=2&continuation-token=nextContinuationToken (next page)
      S3->>S3Driver: XML response (more Keys, IsTruncated, NextContinuationToken)
      S3Driver->>XML: parseListResponse(...)
      XML-->>S3Driver: more keys..., nextContinuationToken...
    end
  end
  S3Driver-->>Client: aggregated keys[] or null

  Note over S3Driver,S3: Delete flow (batched/concurrent)
  Client->>S3Driver: delete(keys...)
  alt bulk delete (>=2 keys)
    S3Driver->>S3: POST /?delete (chunks of up to 1000 keys)
    S3->>S3Driver: delete response(s)
  else single deletes
    S3Driver->>S3: DELETE /key (up to 10 concurrent requests)
    S3->>S3Driver: delete response(s)
  end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

[S3 Driver] getKeys() doesn't respect prefix parameter and missing pagination for >1000 objects #728: Implements ListObjectsV2 pagination, prefix/continuation-token handling, and XML decoding which addresses the reported missing pagination and prefix behavior.

Poem

🐰 Hop, hop — through pages I bound,

Tokens and keys in a merry round,
I batch my deletes, and gather each key,
XML whispers decoded for me. 🥕✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main changes: fixing the prefix parameter handling and adding pagination to getKeys(), which are the primary bugs addressed in the changeset.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Tip

You can enable review details to help with troubleshooting, context usage and more.

Enable the reviews.review_details setting to include review details such as the model used, the time taken for each step and more in the review comments.

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/drivers/s3.ts`:
- Around line 152-155: In the S3 pagination loop (the code that updates
continuationToken using result.isTruncated and result.nextContinuationToken,
e.g., the list/pagination function in src/drivers/s3.ts), add a fail-fast check:
if result.isTruncated is true but result.nextContinuationToken is
null/undefined, throw an error (or reject) with a clear message (include
bucket/prefix context if available) instead of silently breaking the do/while
loop; keep the existing assignment of continuationToken for the normal case
(result.isTruncated ? result.nextContinuationToken : undefined) but perform the
validation immediately after receiving result and before the loop conditional is
evaluated.
- Around line 282-297: The parsed NextContinuationToken and Key values from
listBucketResult are XML-escaped; add a helper (e.g., decodeXmlText) that
replaces &lt;, &gt;, &quot;, &apos;, &amp; with their characters and apply it to
the extracted nextContinuationToken (variable nextContinuationToken) and each
key when building keys (variable keys produced from Contents matches) so the
returned keys and continuation token are decoded before being returned.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 006d0301-576a-4673-9343-d26754c41391

📥 Commits

Reviewing files that changed from the base of the PR and between fb7b8aa and c2ea3da.

📒 Files selected for processing (1)

src/drivers/s3.ts

src/drivers/s3.ts

…esponse values - Throw error when IsTruncated=true but NextContinuationToken is missing - Decode XML entities (&lt;, &gt;, &quot;, &apos;, &amp;) in keys and continuation tokens Addresses CodeRabbit review feedback.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/drivers/s3.ts`:
- Around line 129-157: The listing code (listObjects) can return >1000 keys but
clear()/deleteObjects() currently sends all keys in one DeleteObjects call or
fires unbounded Promise.all; update delete logic to chunk the keys into
<=1000-item batches when calling the S3 DeleteObjects API (keep the existing
bulk DeleteObjects path but loop over batches) and, for the per-object fallback
path, replace the unbounded Promise.all with bounded concurrency (e.g., a simple
worker pool or p-limit) to limit parallel deletes; change the implementation
inside clear() / deleteObjects() (where DeleteObjects and individual deletes are
performed) to iterate over batches and await each batch (or run each batch with
controlled concurrency) so no single request exceeds 1000 keys and provider load
is bounded.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 89810ecc-7344-4751-a045-1bf4e17dad20

📥 Commits

Reviewing files that changed from the base of the PR and between c2ea3da and e29b17f.

📒 Files selected for processing (1)

src/drivers/s3.ts

src/drivers/s3.ts

guoyangzhen · 2026-03-21T04:12:25Z

All 3 CodeRabbit suggestions addressed:

✅ Fail-fast on truncated missing token — Added validation in parseListResponse (commit e29b17f)
✅ XML entity decoding — Added decodeXmlText() for keys and continuation tokens (commit e29b17f)
✅ Batch DeleteObjects for 1000-key limit — Chunked bulk deletes into 1000-key batches, bounded per-object fallback to 10 concurrent deletes (commit db8e897)

The pagination change means listObjects can now return arbitrarily large key sets. S3's DeleteObjects API has a hard limit of 1000 keys per request, so bulk deletes must be chunked. For the bulkDelete: false fallback path, unbounded Promise.all is replaced with sequential batches of 10 to avoid overwhelming the provider.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/drivers/s3.ts`:
- Around line 209-219: The current bulk-delete loop in clear() sends delete
requests with deleteKeysReq(chunk) and assumes a 200 means success; instead
capture the response from awsFetch(`${baseURL}?delete`, ...) into a variable,
read its text, parse the XML using the existing
parseListResponse()/decodeXmlText() pattern, inspect the parsed document for any
<Error> entries for individual keys, and if any errors exist throw (or return) a
clear, descriptive error containing the key(s) and their error messages; update
the loop to only continue when no per-key errors are found and include
references to deleteKeysReq, awsFetch, parseListResponse, decodeXmlText, and
MAX_BULK_DELETE so you modify the correct block.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e254d9d4-6615-4b36-a560-f761a5b58cf7

📥 Commits

Reviewing files that changed from the base of the PR and between e29b17f and db8e897.

📒 Files selected for processing (1)

src/drivers/s3.ts

coderabbitai · 2026-03-21T04:20:33Z

src/drivers/s3.ts

+      // Chunk into batches of MAX_BULK_DELETE for S3 API limit
+      for (let i = 0; i < keys.length; i += MAX_BULK_DELETE) {
+        const chunk = keys.slice(i, i + MAX_BULK_DELETE);
+        const body = deleteKeysReq(chunk);
+        await awsFetch(`${baseURL}?delete`, {
+          method: "POST",
+          headers: {
+            "x-amz-checksum-sha256": await sha256Base64(body),
+          },
+          body,
+        });


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's examine the context around the code in question wc -l src/drivers/s3.ts

Repository: unjs/unstorage

Length of output: 79

🏁 Script executed:

# Read the relevant section and surrounding context cat -n src/drivers/s3.ts | head -250 | tail -100

Repository: unjs/unstorage

Length of output: 3861

🏁 Script executed:

# Search for the deleteKeysReq function to understand the request format rg -A 10 "deleteKeysReq" src/drivers/s3.ts

Repository: unjs/unstorage

Length of output: 653

🏁 Script executed:

# Search for awsFetch definition and how it's used rg -B 5 -A 5 "awsFetch" src/drivers/s3.ts | head -50

Repository: unjs/unstorage

Length of output: 1525

🏁 Script executed:

# Look for any existing response handling patterns in the file rg "\.text|\.json|parseXML|deleteErrors" src/drivers/s3.ts

Repository: unjs/unstorage

Length of output: 284

🏁 Script executed:

# Search for XML parsing functions and response handling patterns rg "parseListResponse|parseDeleteResponse|parseXML|<Error>|<Deleted>" src/drivers/s3.ts

Repository: unjs/unstorage

Length of output: 145

🏁 Script executed:

# Search for decodeXmlText function mentioned in the fix rg "decodeXmlText|decodeXml" src/drivers/s3.ts

Repository: unjs/unstorage

Length of output: 194

🏁 Script executed:

# Get more context on parseListResponse to understand the pattern rg -B 5 -A 15 "parseListResponse" src/drivers/s3.ts

Repository: unjs/unstorage

Length of output: 1090

🏁 Script executed:

# Check if there's any error handling import or utility rg "createError" src/drivers/s3.ts | head -5

Repository: unjs/unstorage

Length of output: 164

🏁 Script executed:

# Get the full parseListResponse implementation to understand the parsing pattern rg -A 40 "^function parseListResponse" src/drivers/s3.ts

Repository: unjs/unstorage

Length of output: 1240

🏁 Script executed:

# Look for any existing DeleteObjects response handling rg -B 5 -A 15 "DeleteObjects|deleteObjects" src/drivers/s3.ts | head -80

Repository: unjs/unstorage

Length of output: 2229

🏁 Script executed:

# Check the full awsFetch implementation to understand error handling rg -B 5 -A 20 "const awsFetch = " src/drivers/s3.ts

Repository: unjs/unstorage

Length of output: 856

🏁 Script executed:

# Check if there's any response parsing in the bulk delete elsewhere rg -C 10 "bulkDelete" src/drivers/s3.ts

Repository: unjs/unstorage

Length of output: 1533

🏁 Script executed:

# Look for any tests that might show expected behavior for bulk delete errors find . -name "*.test.*" -o -name "*.spec.*" | head -10

Repository: unjs/unstorage

Length of output: 472

🏁 Script executed:

# Verify the actual AWS DeleteObjects response format in comments or docs rg "DeleteResult|<Error>" src/

Repository: unjs/unstorage

Length of output: 40

🏁 Script executed:

# Find S3 test files find . -name "*s3*test*" -o -name "*s3*spec*"

Repository: unjs/unstorage

Length of output: 83

🏁 Script executed:

# Look for common test patterns for S3 driver fd "s3" test/

Repository: unjs/unstorage

Length of output: 81

🏁 Script executed:

# Check if there's a test for S3 driver find test -name "*.ts" | xargs grep -l "s3\|S3Driver" 2>/dev/null | head -5

Repository: unjs/unstorage

Length of output: 189

🏁 Script executed:

# Examine the S3 test file for bulk delete test coverage cat -n test/drivers/s3.test.ts | head -150

Repository: unjs/unstorage

Length of output: 3698

🏁 Script executed:

# Search for any bulk delete or DeleteObjects tests rg "clear\|bulk|DeleteObjects" test/drivers/s3.test.ts

Repository: unjs/unstorage

Length of output: 40

🌐 Web query:

AWS S3 DeleteObjects API response format error handling per key

💡 Result:

The AWS S3 DeleteObjects API returns an HTTP 200 response with XML in the body containing per-key results. The response format includes: - Deleted array: For each successfully deleted object, contains Key, VersionId (if versioned), DeleteMarker (boolean), and DeleteMarkerVersionId (if applicable). - Errors array: For each failed deletion, contains Key, VersionId (if provided), Code (error code like "NoSuchKey", "AccessDenied"), and Message (error description). Two response modes exist: - Verbose (default): Returns both Deleted and Errors arrays with results for all keys. - Quiet (set Quiet=true in request): Returns only the Errors array; successful deletions are not listed in the response body. Error handling per key: Individual key failures do not fail the entire request. Check the Errors array for specific failures. Always inspect the response, as SDKs may not throw exceptions for partial failures (e.g., Java SDK returns DeleteObjectsResponse with getErrors list; Boto3 returns dict with 'Errors' key). Example response syntax (verbose): example-object QUpfdndhfd8438mnLLU-TXJh0937jS error-object NoSuchKey The specified key does not exist. null Sources confirm this behavior is unchanged as of latest docs (no date conflicts with 2026-03-21).

Citations:

1: https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteObjects.html

2: https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/internal/DeleteObjectsResponse.html

3: https://docs.aws.amazon.com/cli/latest/reference/s3api/delete-objects.html

4: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3/bucket/delete_objects.html

5: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3/client/delete_objects.html

Parse S3 DeleteObjects responses to detect per-key failures.

The chunking is good, but this code treats any 200 OK as success. S3's DeleteObjects API can return per-key <Error> entries in a successful response, so clear() can silently leave objects behind. Capture and parse each response to check for errors before continuing.

The codebase already has the building blocks: parseListResponse() and decodeXmlText() show the pattern for parsing S3 XML responses. Apply the same approach here—capture the response, check for <Error> entries, and throw if any are found.

Example fix

- await awsFetch(`${baseURL}?delete`, { + const res = await awsFetch(`${baseURL}?delete`, { method: "POST", headers: { "x-amz-checksum-sha256": await sha256Base64(body), }, body, }); + const xml = await res!.text(); + const deleteErrors = [...xml.matchAll( + /<Error>[\s\S]*?<Key>([\s\S]*?)<\/Key>[\s\S]*?<Code>([\s\S]*?)<\/Code>/g, + )]; + if (deleteErrors.length > 0) { + const [_, key, code] = deleteErrors[0]!; + throw createError( + DRIVER_NAME, + `Failed to delete "${decodeXmlText(key)}" (${code})`, + ); + }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/drivers/s3.ts` around lines 209 - 219, The current bulk-delete loop in clear() sends delete requests with deleteKeysReq(chunk) and assumes a 200 means success; instead capture the response from awsFetch(`${baseURL}?delete`, ...) into a variable, read its text, parse the XML using the existing parseListResponse()/decodeXmlText() pattern, inspect the parsed document for any <Error> entries for individual keys, and if any errors exist throw (or return) a clear, descriptive error containing the key(s) and their error messages; update the loop to only continue when no per-key errors are found and include references to deleteKeysReq, awsFetch, parseListResponse, decodeXmlText, and MAX_BULK_DELETE so you modify the correct block.

guoyangzhen requested a review from pi0 as a code owner March 20, 2026 06:49

coderabbitai bot reviewed Mar 20, 2026

View reviewed changes

src/drivers/s3.ts Show resolved Hide resolved

src/drivers/s3.ts Show resolved Hide resolved

coderabbitai bot reviewed Mar 20, 2026

View reviewed changes

src/drivers/s3.ts Show resolved Hide resolved

fix(s3): batch DeleteObjects to respect 1000-key API limit

db8e897

coderabbitai bot reviewed Mar 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(s3): respect prefix parameter and add pagination for getKeys()#760

fix(s3): respect prefix parameter and add pagination for getKeys()#760
guoyangzhen wants to merge 3 commits intounjs:mainfrom
guoyangzhen:fix/s3-prefix-pagination

guoyangzhen commented Mar 20, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 20, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

guoyangzhen commented Mar 21, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

guoyangzhen commented Mar 20, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Bug 1: prefix parameter ignored

Bug 2: no pagination

Fix

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

guoyangzhen commented Mar 21, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

guoyangzhen commented Mar 20, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 20, 2026 •

edited

Loading