Skip to content

Commit 29f74a9

Browse files
RFC-6209: Glob Support (#6209)
* init rfc * fix id * rm some problems * update via review * update rfc * fix struct name * add mod
1 parent a07881a commit 29f74a9

File tree

2 files changed

+136
-0
lines changed

2 files changed

+136
-0
lines changed
Lines changed: 132 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,132 @@
1+
- Proposal Name: `glob_support`
2+
- Start Date: 2025-05-21
3+
- RFC PR: [apache/opendal#6209](https://github.com/apache/opendal/pull/6209)
4+
- Tracking Issue: [apache/opendal#6210](https://github.com/apache/opendal/issues/6210)
5+
6+
# Summary
7+
8+
Add support for matching file paths against Unix shell style patterns (glob) in OpenDAL.
9+
10+
# Motivation
11+
12+
Glob patterns are a widely used way to filter files based on their paths. They provide a simple and intuitive syntax for matching multiple files with similar path patterns. Adding glob support to OpenDAL would enable users to easily filter and process files that match certain patterns without having to implement this functionality themselves.
13+
14+
Currently, users who want to filter objects based on patterns have to list all objects and then apply filters manually, which is verbose and not very intuitive. By providing native glob support, we can make this common operation more convenient and efficient.
15+
16+
# Guide-level explanation
17+
18+
With glob support, users can easily match files based on patterns. The API would be available as an option on the `list_with` and `lister_with` methods, allowing users to filter entries that match the provided glob pattern.
19+
20+
For example:
21+
22+
```rust
23+
// Get all jpeg files in the media directory and its subdirectories
24+
let entries: Vec<Entry> = op.list_with("media/").glob("**/*.jpg").await?;
25+
26+
// Process entries
27+
for entry in entries {
28+
do_something(&entry);
29+
}
30+
31+
// Or use a lister for streaming access
32+
let mut lister = op.lister_with("media/").glob("**/*.jpg").await?;
33+
34+
while let Some(entry) = lister.next().await? {
35+
do_something(&entry);
36+
}
37+
```
38+
39+
The glob syntax would support common patterns like:
40+
41+
- `*` - Match any sequence of non-separator characters
42+
- `?` - Match any single non-separator character
43+
- `**` - Match any sequence of characters including separators
44+
- `{a,b}` - Match either a or b
45+
- `[ab]` - Match either a or b
46+
- `[a-z]` - Match any character in range a-z
47+
48+
The API would be integrated into the existing builder pattern.
49+
50+
# Reference-level explanation
51+
52+
The implementation would involve:
53+
54+
1. Implementing a pattern matching logic for glob expressions. This can be a simplified version focusing on common use cases like `*`, `?`, and `**`.
55+
56+
2. Modifying the `FunctionLister` and `FutureLister` to accept a glob pattern and filter entries accordingly.
57+
58+
The `GlobMatcher` struct would be an internal implementation detail that encapsulates the parsed glob pattern and the matching logic.
59+
60+
```rust
61+
// This is an internal implementation detail, not exposed in the public API
62+
struct GlobMatcher {
63+
// internal representation of the pattern
64+
}
65+
66+
impl GlobMatcher {
67+
fn new(pattern: &str) -> Result<Self> {
68+
// Parse the pattern string
69+
// ...
70+
}
71+
72+
fn matches(&self, path: &str) -> bool {
73+
// Perform the matching logic
74+
// ...
75+
}
76+
}
77+
```
78+
79+
The implementation would be built on top of the existing listing capabilities. Pattern matching will primarily occur client-side. However, for services with native glob/pattern support (e.g., GCS `matchGlob`, Redis `SCAN` with `MATCH`), OpenDAL will delegate the pattern matching to the service where possible to improve efficiency.
80+
81+
# Drawbacks
82+
83+
- While the API surface change is minimized by integrating with the existing builder pattern, it still introduces a new concept (glob patterns) for users to learn.
84+
- Implementing server-side delegation adds complexity, as OpenDAL needs to identify services with native support and translate glob patterns to their specific syntax.
85+
- For services without native glob support, client-side matching still requires listing all potentially relevant entries first, which might be inefficient for very large directories or complex patterns.
86+
- Ensuring consistent behavior between client-side and various server-side implementations of glob matching can be challenging.
87+
88+
# Rationale and alternatives
89+
90+
This design integrates glob filtering into the existing builder pattern API, providing a natural extension to current functionality. We will implement our own pattern matching logic, focusing on commonly used glob syntax (e.g., `*`, `?`, `**`, `*.parquet`) to avoid the complexity of full-featured glob libraries designed for local file systems. This approach allows for a lean implementation tailored to object storage path matching.
91+
92+
Where services offer native pattern matching capabilities, OpenDAL will delegate to them. This leverages server-side efficiencies. For other services, client-side filtering will be applied after listing entries.
93+
94+
Alternatives considered:
95+
96+
1. Not implementing this feature and letting users implement filtering manually
97+
- This puts the burden on users and leads to repetitive code.
98+
- Users might implement inefficient or buggy filtering.
99+
100+
2. Relying entirely on an external glob library
101+
- Most glob libraries include complex logic for local file systems (e.g., directory traversal, symlink handling) which is not needed for OpenDAL's path matching.
102+
- This can introduce unnecessary dependencies and overhead.
103+
104+
3. Implementing server-side filtering *only* for services that support it, without a client-side fallback.
105+
- This would lead to inconsistent feature availability across services.
106+
- A client-side fallback ensures glob functionality is universally available.
107+
108+
4. Adding a more general filtering API instead of specifically glob patterns
109+
- While potentially more flexible, this would be more complex to design and implement.
110+
- Glob patterns are a well-understood and widely used standard for this type of path matching, covering the majority of use cases.
111+
112+
Not providing a unified glob capability means users continue to write verbose code for a common operation, or face inconsistencies if trying to leverage service-specific features directly. OpenDAL aims to provide a consistent and ergonomic interface for such common tasks.
113+
114+
# Prior art
115+
116+
Many file system and storage APIs provide glob or similar pattern matching capabilities:
117+
118+
- The [glob](https://crates.io/crates/glob) crate in Rust
119+
- Python's [glob](https://docs.python.org/3/library/glob.html) module
120+
- Node.js [glob](https://www.npmjs.com/package/glob) package
121+
- Unix shells like bash with built-in glob support
122+
123+
Most implementations provide similar syntax, though there are some variations. We should align with established Rust patterns.
124+
125+
# Unresolved questions
126+
127+
None
128+
129+
# Future possibilities
130+
131+
- If services add native support for glob filtering, we could optimize by pushing the filtering to the server side
132+
- We could extend the API to support more advanced pattern matching like regex

core/src/docs/rfcs/mod.rs

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -269,6 +269,10 @@ pub mod rfc_5871_read_returns_metadata {}
269269
#[doc = include_str!("6189_remove_native_blocking.md")]
270270
pub mod rfc_6189_remove_native_blocking {}
271271

272+
/// Glob support
273+
#[doc = include_str!("6209_glob_support.md")]
274+
pub mod rfc_6209_glob_support {}
275+
272276
/// Options API
273277
#[doc = include_str!("6213_options_api.md")]
274278
pub mod rfc_6213_options_api {}

0 commit comments

Comments
 (0)