fix(s3): respect prefix parameter and add pagination for getKeys()#760
fix(s3): respect prefix parameter and add pagination for getKeys()#760guoyangzhen wants to merge 3 commits intounjs:mainfrom
Conversation
…njs#728) Two bugs fixed: 1. listObjects() accepted a prefix parameter but never used it in the S3 ListObjectsV2 API call. Now appends ?list-type=2&prefix=... to URL. 2. S3 returns max 1000 objects per response. Added pagination loop using NextContinuationToken to fetch all matching keys.
📝 WalkthroughWalkthroughUpdated S3 driver to use ListObjectsV2 pagination (list-type=2) with prefix and continuation-token handling, robust XML decoding/response parsing, and iterative aggregation of keys; also changed delete logic to batch deletes (1000/key bulk) and limit concurrent single deletes to 10. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant S3Driver
participant S3 as "AWS S3"
participant XML as "XML Parser"
Client->>S3Driver: listObjects(prefix?)
S3Driver->>S3: GET /?list-type=2&prefix=&continuation-token? (page 1)
S3->>S3Driver: XML response (Keys, IsTruncated, NextContinuationToken)
S3Driver->>XML: parseListResponse(response)
XML-->>S3Driver: keys[], isTruncated, nextContinuationToken
alt isTruncated == true
loop fetch next pages
S3Driver->>S3: GET /?list-type=2&continuation-token=nextContinuationToken (next page)
S3->>S3Driver: XML response (more Keys, IsTruncated, NextContinuationToken)
S3Driver->>XML: parseListResponse(...)
XML-->>S3Driver: more keys..., nextContinuationToken...
end
end
S3Driver-->>Client: aggregated keys[] or null
Note over S3Driver,S3: Delete flow (batched/concurrent)
Client->>S3Driver: delete(keys...)
alt bulk delete (>=2 keys)
S3Driver->>S3: POST /?delete (chunks of up to 1000 keys)
S3->>S3Driver: delete response(s)
else single deletes
S3Driver->>S3: DELETE /key (up to 10 concurrent requests)
S3->>S3Driver: delete response(s)
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related issues
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip You can enable review details to help with troubleshooting, context usage and more.Enable the |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/drivers/s3.ts`:
- Around line 152-155: In the S3 pagination loop (the code that updates
continuationToken using result.isTruncated and result.nextContinuationToken,
e.g., the list/pagination function in src/drivers/s3.ts), add a fail-fast check:
if result.isTruncated is true but result.nextContinuationToken is
null/undefined, throw an error (or reject) with a clear message (include
bucket/prefix context if available) instead of silently breaking the do/while
loop; keep the existing assignment of continuationToken for the normal case
(result.isTruncated ? result.nextContinuationToken : undefined) but perform the
validation immediately after receiving result and before the loop conditional is
evaluated.
- Around line 282-297: The parsed NextContinuationToken and Key values from
listBucketResult are XML-escaped; add a helper (e.g., decodeXmlText) that
replaces <, >, ", ', & with their characters and apply it to
the extracted nextContinuationToken (variable nextContinuationToken) and each
key when building keys (variable keys produced from Contents matches) so the
returned keys and continuation token are decoded before being returned.
…esponse values - Throw error when IsTruncated=true but NextContinuationToken is missing - Decode XML entities (&lt;, &gt;, &quot;, &apos;, &amp;) in keys and continuation tokens Addresses CodeRabbit review feedback.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/drivers/s3.ts`:
- Around line 129-157: The listing code (listObjects) can return >1000 keys but
clear()/deleteObjects() currently sends all keys in one DeleteObjects call or
fires unbounded Promise.all; update delete logic to chunk the keys into
<=1000-item batches when calling the S3 DeleteObjects API (keep the existing
bulk DeleteObjects path but loop over batches) and, for the per-object fallback
path, replace the unbounded Promise.all with bounded concurrency (e.g., a simple
worker pool or p-limit) to limit parallel deletes; change the implementation
inside clear() / deleteObjects() (where DeleteObjects and individual deletes are
performed) to iterate over batches and await each batch (or run each batch with
controlled concurrency) so no single request exceeds 1000 keys and provider load
is bounded.
|
All 3 CodeRabbit suggestions addressed:
The pagination change means |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/drivers/s3.ts`:
- Around line 209-219: The current bulk-delete loop in clear() sends delete
requests with deleteKeysReq(chunk) and assumes a 200 means success; instead
capture the response from awsFetch(`${baseURL}?delete`, ...) into a variable,
read its text, parse the XML using the existing
parseListResponse()/decodeXmlText() pattern, inspect the parsed document for any
<Error> entries for individual keys, and if any errors exist throw (or return) a
clear, descriptive error containing the key(s) and their error messages; update
the loop to only continue when no per-key errors are found and include
references to deleteKeysReq, awsFetch, parseListResponse, decodeXmlText, and
MAX_BULK_DELETE so you modify the correct block.
| // Chunk into batches of MAX_BULK_DELETE for S3 API limit | ||
| for (let i = 0; i < keys.length; i += MAX_BULK_DELETE) { | ||
| const chunk = keys.slice(i, i + MAX_BULK_DELETE); | ||
| const body = deleteKeysReq(chunk); | ||
| await awsFetch(`${baseURL}?delete`, { | ||
| method: "POST", | ||
| headers: { | ||
| "x-amz-checksum-sha256": await sha256Base64(body), | ||
| }, | ||
| body, | ||
| }); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's examine the context around the code in question
wc -l src/drivers/s3.tsRepository: unjs/unstorage
Length of output: 79
🏁 Script executed:
# Read the relevant section and surrounding context
cat -n src/drivers/s3.ts | head -250 | tail -100Repository: unjs/unstorage
Length of output: 3861
🏁 Script executed:
# Search for the deleteKeysReq function to understand the request format
rg -A 10 "deleteKeysReq" src/drivers/s3.tsRepository: unjs/unstorage
Length of output: 653
🏁 Script executed:
# Search for awsFetch definition and how it's used
rg -B 5 -A 5 "awsFetch" src/drivers/s3.ts | head -50Repository: unjs/unstorage
Length of output: 1525
🏁 Script executed:
# Look for any existing response handling patterns in the file
rg "\.text\(\)|\.json\(\)|parseXML|deleteErrors" src/drivers/s3.tsRepository: unjs/unstorage
Length of output: 284
🏁 Script executed:
# Search for XML parsing functions and response handling patterns
rg "parseListResponse|parseDeleteResponse|parseXML|<Error>|<Deleted>" src/drivers/s3.tsRepository: unjs/unstorage
Length of output: 145
🏁 Script executed:
# Search for decodeXmlText function mentioned in the fix
rg "decodeXmlText|decodeXml" src/drivers/s3.tsRepository: unjs/unstorage
Length of output: 194
🏁 Script executed:
# Get more context on parseListResponse to understand the pattern
rg -B 5 -A 15 "parseListResponse" src/drivers/s3.tsRepository: unjs/unstorage
Length of output: 1090
🏁 Script executed:
# Check if there's any error handling import or utility
rg "createError" src/drivers/s3.ts | head -5Repository: unjs/unstorage
Length of output: 164
🏁 Script executed:
# Get the full parseListResponse implementation to understand the parsing pattern
rg -A 40 "^function parseListResponse" src/drivers/s3.tsRepository: unjs/unstorage
Length of output: 1240
🏁 Script executed:
# Look for any existing DeleteObjects response handling
rg -B 5 -A 15 "DeleteObjects|deleteObjects" src/drivers/s3.ts | head -80Repository: unjs/unstorage
Length of output: 2229
🏁 Script executed:
# Check the full awsFetch implementation to understand error handling
rg -B 5 -A 20 "const awsFetch = " src/drivers/s3.tsRepository: unjs/unstorage
Length of output: 856
🏁 Script executed:
# Check if there's any response parsing in the bulk delete elsewhere
rg -C 10 "bulkDelete" src/drivers/s3.tsRepository: unjs/unstorage
Length of output: 1533
🏁 Script executed:
# Look for any tests that might show expected behavior for bulk delete errors
find . -name "*.test.*" -o -name "*.spec.*" | head -10Repository: unjs/unstorage
Length of output: 472
🏁 Script executed:
# Verify the actual AWS DeleteObjects response format in comments or docs
rg "DeleteResult|<Error>" src/Repository: unjs/unstorage
Length of output: 40
🏁 Script executed:
# Find S3 test files
find . -name "*s3*test*" -o -name "*s3*spec*"Repository: unjs/unstorage
Length of output: 83
🏁 Script executed:
# Look for common test patterns for S3 driver
fd "s3" test/Repository: unjs/unstorage
Length of output: 81
🏁 Script executed:
# Check if there's a test for S3 driver
find test -name "*.ts" | xargs grep -l "s3\|S3Driver" 2>/dev/null | head -5Repository: unjs/unstorage
Length of output: 189
🏁 Script executed:
# Examine the S3 test file for bulk delete test coverage
cat -n test/drivers/s3.test.ts | head -150Repository: unjs/unstorage
Length of output: 3698
🏁 Script executed:
# Search for any bulk delete or DeleteObjects tests
rg "clear\|bulk|DeleteObjects" test/drivers/s3.test.tsRepository: unjs/unstorage
Length of output: 40
🌐 Web query:
AWS S3 DeleteObjects API response format error handling per key
💡 Result:
The AWS S3 DeleteObjects API returns an HTTP 200 response with XML in the body containing per-key results. The response format includes: - Deleted array: For each successfully deleted object, contains Key, VersionId (if versioned), DeleteMarker (boolean), and DeleteMarkerVersionId (if applicable). - Errors array: For each failed deletion, contains Key, VersionId (if provided), Code (error code like "NoSuchKey", "AccessDenied"), and Message (error description). Two response modes exist: - Verbose (default): Returns both Deleted and Errors arrays with results for all keys. - Quiet (set Quiet=true in request): Returns only the Errors array; successful deletions are not listed in the response body. Error handling per key: Individual key failures do not fail the entire request. Check the Errors array for specific failures. Always inspect the response, as SDKs may not throw exceptions for partial failures (e.g., Java SDK returns DeleteObjectsResponse with getErrors list; Boto3 returns dict with 'Errors' key). Example response syntax (verbose): example-object QUpfdndhfd8438mnLLU-TXJh0937jS error-object NoSuchKey The specified key does not exist. null Sources confirm this behavior is unchanged as of latest docs (no date conflicts with 2026-03-21).
Citations:
- 1: https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteObjects.html
- 2: https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/internal/DeleteObjectsResponse.html
- 3: https://docs.aws.amazon.com/cli/latest/reference/s3api/delete-objects.html
- 4: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3/bucket/delete_objects.html
- 5: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3/client/delete_objects.html
Parse S3 DeleteObjects responses to detect per-key failures.
The chunking is good, but this code treats any 200 OK as success. S3's DeleteObjects API can return per-key <Error> entries in a successful response, so clear() can silently leave objects behind. Capture and parse each response to check for errors before continuing.
The codebase already has the building blocks: parseListResponse() and decodeXmlText() show the pattern for parsing S3 XML responses. Apply the same approach here—capture the response, check for <Error> entries, and throw if any are found.
Example fix
- await awsFetch(`${baseURL}?delete`, {
+ const res = await awsFetch(`${baseURL}?delete`, {
method: "POST",
headers: {
"x-amz-checksum-sha256": await sha256Base64(body),
},
body,
});
+ const xml = await res!.text();
+ const deleteErrors = [...xml.matchAll(
+ /<Error>[\s\S]*?<Key>([\s\S]*?)<\/Key>[\s\S]*?<Code>([\s\S]*?)<\/Code>/g,
+ )];
+ if (deleteErrors.length > 0) {
+ const [_, key, code] = deleteErrors[0]!;
+ throw createError(
+ DRIVER_NAME,
+ `Failed to delete "${decodeXmlText(key)}" (${code})`,
+ );
+ }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/drivers/s3.ts` around lines 209 - 219, The current bulk-delete loop in
clear() sends delete requests with deleteKeysReq(chunk) and assumes a 200 means
success; instead capture the response from awsFetch(`${baseURL}?delete`, ...)
into a variable, read its text, parse the XML using the existing
parseListResponse()/decodeXmlText() pattern, inspect the parsed document for any
<Error> entries for individual keys, and if any errors exist throw (or return) a
clear, descriptive error containing the key(s) and their error messages; update
the loop to only continue when no per-key errors are found and include
references to deleteKeysReq, awsFetch, parseListResponse, decodeXmlText, and
MAX_BULK_DELETE so you modify the correct block.
Problem
Two bugs in the S3 driver:
Bug 1: prefix parameter ignored
accepts a prefix parameter but never uses it in the fetch URL. When is called, it returns ALL keys in the bucket instead of only keys matching the prefix.
The function even logs the intended URL () but the actual fetch always uses without any query parameters.
Bug 2: no pagination
S3 ListObjectsV2 API returns max 1000 objects per response. Buckets with >1000 objects only return the first 1000 keys, silently dropping the rest.
Fix
This fix also benefits which internally calls .
Summary by CodeRabbit
Bug Fixes
New Features