Skip to content

pass glob prefix to driver fix: #1995#1996

Open
dhrp wants to merge 1 commit intofsspec:masterfrom
dhrp:master
Open

pass glob prefix to driver fix: #1995#1996
dhrp wants to merge 1 commit intofsspec:masterfrom
dhrp:master

Conversation

@dhrp
Copy link

@dhrp dhrp commented Feb 26, 2026

This add support to fsspec to pass a prefix to supported backends.

What it does is extract the literal stem between the last / and the first wildcard character (, ?, [), and pass it as prefix= in the kwargs forwarded to _find. Backends that understand prefix=(gcsfs, adlfs, s3fs) use it to filter the listing server-side via the storage API's ?prefix= parameter. Backends that don't understand it receive it in kwargs and are expected to silently ignore it — no behaviour change for them.

*s3fs needs a fix to not break, and use the support; this PR should be considered blocked by it's support added. It may otherwise break implementations.
**expected: except s3fs all that I could find do; but I'm not sure what I'm missing..

closes #1995

To Do

@martindurant
Copy link
Member

Please check failures in the "downstream" CI job. These use withdirs or maxdepth, so we may not h ave enough coverage in the tests here.

Question: why are those two kwargs problematic in the presence of a prefix?

@dhrp
Copy link
Author

dhrp commented Mar 12, 2026

Question: why are those two kwargs problematic in the presence of a prefix?

Good question.

I've made a fix PR for s3fs, in which I learned that if you would naively do prefix+maxdepth you would get a listing of all files at all depths and then a filter python side. That would be painful and was maybe what this guard was trying to prevent.

Both the downstream test and the s3fs test seem to fail on the same guard in S3FS; the one I now hope to resolve in fsspec/s3fs#1014. That PR is the one that should be reviewed (and merged) first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

_glob performs full directory/bucket scans - unnecessarily

2 participants