Fix S3 lookup unbounded pagination with double call by jorgee · Pull Request #6851 · nextflow-io/nextflow

jorgee · 2026-02-20T12:52:15Z

Problem

S3ObjectSummaryLookup.lookup() used an unbounded while(true) pagination loop that iterated through all objects sharing a given prefix (fetching 250 keys per page). On S3 buckets with large prefixes containing millions of objects, this caused excessive LIST API calls, high latency, and potential timeouts — just to check whether a single path exists.

Solution

Replace the unbounded loop with at most two bounded listObjects calls:

Call 1 — prefix(key), maxKeys(2): covers the common cases where the exact key or its first directory child appears within the first 2 lexicographic results.
Call 2 (fallback) — prefix(key + "/"), maxKeys(1): needed because S3 lists keys in lexicographic (UTF-8 byte) order, and characters like - (0x2D) and . (0x2E) sort before / (0x2F). This means sibling keys such as a-a/ and a.txt appear before a/ in the listing, potentially pushing the directory child outside Call 1's result window. Call 2 searches with prefix key/ directly, bypassing those siblings.

Example of the lexicographic ordering issue

Given keys a-a/file-3, a.txt, and a/file-1, S3 returns them as:

a-a/file-3   ← '-' (0x2D) < '/' (0x2F)
a.txt         ← '.' (0x2E) < '/' (0x2F)
a/file-1      ← '/' (0x2F) — the actual directory child

With maxKeys(2), Call 1 only sees a-a/file-3 and a.txt — neither matches. Call 2 with prefix a/ finds a/file-1, confirming that a is a directory.

Alternative to #6849

The lookup method paginated through all objects under an S3 prefix (maxKeys=250) to check path existence. On prefixes with millions of objects this caused the main thread to hang for minutes parsing massive XML responses. Observed in production: nf-schema parameter validation calls Files.exists() on an S3 outdir path, which triggers S3ObjectSummaryLookup.lookup. With a large prefix like s3://bucket/results containing many objects from previous runs, the pagination loop iterated indefinitely. Fix: use maxKeys=2 and remove pagination. The matchName check only needs to find the exact key or its first child (key + "/"), which are guaranteed to appear in the first results due to S3 lexicographic ordering. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>

…refix and smaller lexico order characters than / Signed-off-by: jorgee <jorge.ejarque@seqera.io>

netlify · 2026-02-20T12:52:20Z

✅ Deploy Preview for nextflow-docs-staging canceled.

Name	Link
🔨 Latest commit	`4d5fd24`
🔍 Latest deploy log	https://app.netlify.com/projects/nextflow-docs-staging/deploys/699889e920365d00089931fa

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>

pditommaso

Well done. Considering is really a tricky issue, I took the liberty to extend the docs/comment

pditommaso and others added 2 commits February 19, 2026 21:59

add a second call to ensure directory are found when keys with same p…

ef139f6

…refix and smaller lexico order characters than / Signed-off-by: jorgee <jorge.ejarque@seqera.io>

jorgee mentioned this pull request Feb 20, 2026

Fix S3 path lookup hanging on large prefixes #6849

Closed

2 tasks

Add inline comments explaining S3 lookup rationale [ci skip]

4d5fd24

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>

pditommaso approved these changes Feb 20, 2026

View reviewed changes

pditommaso merged commit a2e67eb into master Feb 23, 2026
7 checks passed

pditommaso deleted the fix/s3-lookup-unbounded-pagination-with-double-call branch February 23, 2026 11:37

bentsherman added the storage/aws label Feb 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix S3 lookup unbounded pagination with double call#6851

Fix S3 lookup unbounded pagination with double call#6851
pditommaso merged 3 commits intomasterfrom
fix/s3-lookup-unbounded-pagination-with-double-call

jorgee commented Feb 20, 2026 •

edited by pditommaso

Loading

Uh oh!

netlify bot commented Feb 20, 2026 •

edited

Loading

Uh oh!

pditommaso left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jorgee commented Feb 20, 2026 • edited by pditommaso Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Example of the lexicographic ordering issue

Uh oh!

netlify bot commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for nextflow-docs-staging canceled.

Uh oh!

pditommaso left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jorgee commented Feb 20, 2026 •

edited by pditommaso

Loading

netlify bot commented Feb 20, 2026 •

edited

Loading