WIP: Paginated list support to allow substring list prefix #545
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Currently, obstore only supports listing by path segments. So if you pass a prefix into
list_with_delimiter
orlist
, that will be assumed to be a full path segment. This means that it's currently impossible to do efficiently perform the desired query from #494:object_store
supports substring-based prefix listing in itsPaginatedListStore
API. So if I use that and provide my own pagination -> stream conversion, then I should be able to essentially match the currentlist
API.However, this
PaginatedListStore
is only implemented for S3, Azure, and GCS. It's not implemented for HTTPStore or LocalStore, because those don't have a concept of pagination. See apache/arrow-rs-object-store#388.This means that to support ...
... or, better idea, in
obstore.list
we:Arc<dyn ObjectStore>
, we have essentially an enum of the different storesPaginatedListStore
, to support efficient querying of substring prefixObjectStore::list
, so that we never materialize the entire streamPaginatedListStore
For now, as a first pass, we'll only use this to improve
obstore.list
, while not touchinglist_with_delimiter
. Later we can explore making that return type a stream as well.Closes #494