Commit da4899b
## Filtered recursive walk
Expanding a recursive `**` segment entails walking the entire directory
tree, and so any subsequent pattern segments (except special segments) can
be evaluated by filtering the expanded paths through a regex. For example,
`glob.glob("foo/**/*.py", recursive=True)` recursively walks `foo/` with
`os.scandir()`, and then filters paths through a regex based on "`**/*.py`,
with no further filesystem access needed.
This fixes an issue where `glob()` could return duplicate results.
## Tracking path existence
We store a flag alongside each path indicating whether the path is
guaranteed to exist. As we process the pattern:
- Certain special pattern segments (`""`, `"."` and `".."`) leave the flag
unchanged
- Literal pattern segments (e.g. `foo/bar`) set the flag to false
- Wildcard pattern segments (e.g. `*/*.py`) set the flag to true (because
children are found via `os.scandir()`)
- Recursive pattern segments (e.g. `**`) leave the flag unchanged for the
root path, and set it to true for descendants discovered via
`os.scandir()`.
If the flag is false at the end, we call `lstat()` on each path to filter
out missing paths.
## Minor speed-ups
- Exclude paths that don't match a non-terminal non-recursive wildcard
pattern _prior_ to calling `is_dir()`.
- Use a stack rather than recursion to implement recursive wildcards.
- This fixes a recursion error when globbing deep trees.
- Pre-compile regular expressions and pre-join literal pattern segments.
- Convert to/from `bytes` (a minor use-case) in `iglob()` rather than
supporting `bytes` throughout. This particularly simplifies the code
needed to handle relative bytes paths with `dir_fd`.
- Avoid calling `os.path.join()`; instead we keep paths in a normalized
form and append trailing slashes when needed.
- Avoid calling `os.path.normcase()`; instead we use case-insensitive regex
matching.
## Implementation notes
Much of this functionality is already present in pathlib's implementation
of globbing. The specific additions we make are:
1. Support for `dir_fd`
2. Support for `include_hidden`
3. Support for generating paths relative to `root_dir`
This unifies the implementations of globbing in the `glob` and `pathlib`
modules.
Co-authored-by: Pieter Eendebak <[email protected]>
Co-authored-by: Bénédikt Tran <[email protected]>
1 parent b545450 commit da4899b
File tree
7 files changed
+240
-229
lines changed- Doc
- library
- whatsnew
- Lib
- pathlib
- test
- Misc/NEWS.d/next/Library
7 files changed
+240
-229
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
75 | 75 | | |
76 | 76 | | |
77 | 77 | | |
78 | | - | |
79 | | - | |
80 | | - | |
81 | | - | |
82 | 78 | | |
83 | 79 | | |
84 | 80 | | |
| |||
88 | 84 | | |
89 | 85 | | |
90 | 86 | | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
91 | 92 | | |
92 | 93 | | |
93 | 94 | | |
| |||
98 | 99 | | |
99 | 100 | | |
100 | 101 | | |
101 | | - | |
102 | | - | |
103 | | - | |
104 | | - | |
105 | 102 | | |
106 | 103 | | |
107 | 104 | | |
| |||
111 | 108 | | |
112 | 109 | | |
113 | 110 | | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
114 | 116 | | |
115 | 117 | | |
116 | 118 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
968 | 968 | | |
969 | 969 | | |
970 | 970 | | |
| 971 | + | |
| 972 | + | |
| 973 | + | |
| 974 | + | |
| 975 | + | |
| 976 | + | |
| 977 | + | |
| 978 | + | |
971 | 979 | | |
972 | 980 | | |
973 | 981 | | |
| |||
0 commit comments