Skip to content

Commit 5ce4156

Browse files
authored
Optimizations around directory traversing for sync exclusion (#3702)
## Changes <!-- Brief summary of your changes that is easy to understand --> 1. Do not error out on a file that is removed / renamed while the path is being traversed 2. Do not traverse path if exclusion directive is empty ## Why <!-- Why are these changes needed? Provide the context that the reviewer might be missing. For example, were there any decisions behind the change that are not reflected in the code itself? --> 1. I stumbled upon this error when working on local cache implementation testing the case when a local cache dir is inside a `.databricks` folder within a bundle. `recursiveListFiles` has a classic [TOCTOU (Time-of-Check-Time-of-Use)](https://en.wikipedia.org/wiki/Time-of-check_to_time-of-use) race condition - `WalkDir` collects directory entries first, but `d.Info()` makes a fresh stat() syscall. If the file was renamed/moved/deleted in between, the stat call fails because it's looking for a path that no longer exists. This change proposes to **skip the inner `stat` call**, so that the method will not throw an error on an non-existing file 2. While looking into the `sync` behavior I noticed that it traverses the bundle directory looking for files to include/exclude, even in case when no files are configured to be included/excluded - the tree traversal in this case is not necessary. ## Tests <!-- How have you tested the changes? --> Existing tests must pass <!-- If your PR needs to be included in the release notes for next release, add a separate entry in NEXT_CHANGELOG.md as part of your PR. -->
1 parent 5e1d01e commit 5ce4156

File tree

2 files changed

+9
-7
lines changed

2 files changed

+9
-7
lines changed

libs/fileset/fileset.go

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,10 @@ func New(root vfs.Path, args ...[]string) *FileSet {
5757
}
5858
}
5959

60+
func Empty() *FileSet {
61+
return &FileSet{}
62+
}
63+
6064
// Ignorer returns the [FileSet]'s current ignorer.
6165
func (w *FileSet) Ignorer() Ignorer {
6266
return w.ignore
@@ -92,13 +96,8 @@ func (w *FileSet) recursiveListFiles(path string, seen map[string]struct{}) (out
9296
return err
9397
}
9498

95-
info, err := d.Info()
96-
if err != nil {
97-
return err
98-
}
99-
10099
switch {
101-
case info.Mode().IsDir():
100+
case d.IsDir():
102101
ign, err := w.ignore.IgnoreDirectory(name)
103102
if err != nil {
104103
return fmt.Errorf("cannot check if %s should be ignored: %w", name, err)
@@ -107,7 +106,7 @@ func (w *FileSet) recursiveListFiles(path string, seen map[string]struct{}) (out
107106
return fs.SkipDir
108107
}
109108

110-
case info.Mode().IsRegular():
109+
case d.Type().IsRegular():
111110
ign, err := w.ignore.IgnoreFile(name)
112111
if err != nil {
113112
return fmt.Errorf("cannot check if %s should be ignored: %w", name, err)

libs/fileset/glob.go

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,9 @@ import (
77
)
88

99
func NewGlobSet(root vfs.Path, includes []string) (*FileSet, error) {
10+
if len(includes) == 0 {
11+
return Empty(), nil
12+
}
1013
for k := range includes {
1114
includes[k] = path.Clean(includes[k])
1215
}

0 commit comments

Comments
 (0)