-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Fix/issue 7921 search files includes files ignored by nested gitignore #8445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix/issue 7921 search files includes files ignored by nested gitignore #8445
Conversation
pull latest
| } | ||
|
|
||
| // Convert real path to relative for ignore checking | ||
| const relativePath = path.relative(this.cwd, realPath).toPosix() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The expression path.relative(this.cwd, realPath).toPosix() uses a non-standard method. Consider using a standard alternative (e.g. path.posix.relative(this.cwd, realPath) or replacing backslashes) to ensure cross-platform consistency.
| const relativePath = path.relative(this.cwd, realPath).toPosix() | |
| const relativePath = path.posix.relative(this.cwd, realPath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found some issues that need attention. See inline comment.
| regex, | ||
| filePattern, | ||
| cline.rooIgnoreController, | ||
| cline.gitIgnoreController, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[P1] The issue requests an option to include ignored files only when explicitly asked. There’s no include_ignored parameter; consider adding an optional flag and threading it into regexSearchFiles to bypass controllers when requested.
|
I'm going to have to test this on a windows machine. Should have that started tomorrow. |
|
I'm going to write a few more tests tomorrow, mainly to confirm that all matching pattern in subdirectories are recursively filtered |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found some issues that need attention. See inline comments.
| } | ||
| } | ||
|
|
||
| // Always ignore .gitignore files themselves |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[P1] Recursive discovery is never used, so nested .gitignore files outside the hardcoded locations won't be honored. findGitignoreFilesRecursively() is defined but never invoked, and discoverAndLoadGitignoreFiles() only checks a fixed set (root/src/lib/test/tests). This undermines the stated behavior and the issue’s requirement to respect nested .gitignore across the tree. Consider invoking recursive discovery and loading newly found files before finalizing patterns.
| // Always ignore .gitignore files themselves | |
| // Also discover arbitrary nested .gitignore files across the workspace | |
| await this.findGitignoreFilesRecursively(this.cwd) | |
| // Load any files discovered by recursion that weren't loaded yet | |
| for (const p of this.gitignoreFiles) { | |
| if (!this.gitignoreContents.has(p)) { | |
| await this.loadGitignoreFile(p) | |
| } | |
| } | |
| // Always ignore .gitignore files themselves | |
| this.ignoreInstance.add(".gitignore") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found some issues that need attention. See inline comments.
| } | ||
|
|
||
| // Always ignore .gitignore files themselves | ||
| this.ignoreInstance.add(".gitignore") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[P1] Only the root-level .gitignore is ignored; nested .gitignore files are not. Use a glob to ignore at all depths so .gitignore files themselves are consistently excluded throughout the tree.
| this.ignoreInstance.add(".gitignore") | |
| // Always ignore .gitignore files at any depth | |
| this.ignoreInstance.add("**/.gitignore") |
| if (reloadTimeout) { | ||
| clearTimeout(reloadTimeout) | ||
| } | ||
| reloadTimeout = setTimeout(reloadCallback, 100) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[P3] Debounced reload can fire after dispose, leading to callbacks running on disposed state. Consider storing the timeout on the instance and clearing it in dispose(), or guarding reloadCallback when disposed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found some issues that need attention. See inline comments.
| ? results.filter((result) => rooIgnoreController.validateAccess(result.file)) | ||
| : results | ||
| // Filter results using both controllers if provided | ||
| const filteredResults = results |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[P3] Two-pass filtering iterates the array twice and calls validateAccess twice per result. Combine predicates to reduce traversal and overhead.
| const filteredResults = results | |
| const filteredResults = results.filter((result) => { | |
| const file = result.file | |
| return (gitIgnoreController?.validateAccess(file) ?? true) && (rooIgnoreController?.validateAccess(file) ?? true) | |
| }) |
| await this.findGitignoreFilesRecursively(this.cwd) | ||
|
|
||
| // Load any files discovered by recursion that weren't loaded yet | ||
| for (const p of this.gitignoreFiles) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[P2] Potential duplicates in gitignoreFiles if a path is discovered via both the predefined list and recursion. De-duplicate before loading to avoid redundant add() calls and reduce churn.
| for (const p of this.gitignoreFiles) { | |
| // Load any files discovered by recursion that weren't loaded yet | |
| // De-duplicate discovered paths to avoid redundant loads | |
| this.gitignoreFiles = Array.from(new Set(this.gitignoreFiles)) | |
| for (const p of this.gitignoreFiles) { | |
| if (!this.gitignoreContents.has(p)) { | |
| await this.loadGitignoreFile(p) | |
| } | |
| } |
|
@daniel-lxs can you please review this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found some issues that need attention. See inline comments.
|
|
||
| if (relativeDir) { | ||
| // For nested .gitignore files, we need to create patterns that match files within that directory | ||
| const lines = content.split(/\r?\n/).filter((line) => line.trim() && !line.startsWith("#")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[P1] Escaped literals are misparsed. Lines beginning with an escaped '#' (e.g., '#foo') are treated as comments and dropped; and lines beginning with an escaped '!' (e.g., '!keep.txt') are treated as negations. Gitignore semantics: '#' and '!' at the start denote literal '#' and '!' patterns. Parse comments only when the first non-whitespace char is an unescaped '#', and unescape a leading '#' or '!' before further processing.
| if (trimmed.startsWith("/")) { | ||
| // Absolute patterns (starting with /) are relative to the .gitignore location | ||
| return [normalizedRelativeDir + trimmed] | ||
| } else if (trimmed.startsWith("!")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[P1] Negation handling doesn't account for escaped '!'. A pattern like '!keep.txt' should be treated as a literal starting with '!', not a re-include. Add a guard before the negation branch to detect and strip a leading backslash before '!' so it is treated as a literal pattern.
|
... I'm going to revert the changes. Already spent a ton of time trying to get those windows tests to pass before. Idk if the overhead of an additional pass really adds up to anything in reality anyway. Caching the file search is more than enough to offset I would think. Let me add some additional testing coverage for the gitignore semantics. That's important |
…traversal" add additional test coverage for escaped literals This reverts commit 0b34253.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found some issues that need attention. See inline comments.
|
|
||
| if (relativeDir) { | ||
| // For nested .gitignore files, we need to create patterns that match files within that directory | ||
| const lines = content.split(/\r?\n/).filter((line) => line.trim() && !line.startsWith("#")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P2: Comment detection trims content but checks startsWith on the untrimmed line. Lines like ' # comment' will be treated as patterns and then fed to ignore(), which is not intended and can create bogus entries. Use the trimmed string for the comment check (escaped # still passes through as intended).
| const lines = content.split(/\r?\n/).filter((line) => line.trim() && !line.startsWith("#")) | |
| const lines = content | |
| .split(/\r?\n/) | |
| .filter((line) => { | |
| const t = line.trim(); | |
| return t.length > 0 && !t.startsWith("#"); | |
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@roomote-agent we need test coverage for this at this point. Can you confirm that the tests don't cover this?
| } | ||
|
|
||
| // Always ignore .gitignore files themselves | ||
| this.ignoreInstance.add(".gitignore") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P2: Only the root .gitignore is excluded from results with this pattern. Nested .gitignore files will still be considered unless we also ignore '**/.gitignore'. This can surface .gitignore files from subdirectories in search results.
| this.ignoreInstance.add(".gitignore") | |
| this.ignoreInstance.add([".gitignore", "**/.gitignore"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this important? idk if this matters. Sometimes I want roo to edit .gitignore files
| const entries = await fs.readdir(dirPath, { withFileTypes: true }) | ||
| const subdirs = entries | ||
| .filter((entry) => entry.isDirectory() && !entry.name.startsWith(".")) | ||
| .map((entry) => path.join(dirPath, entry.name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P2: Recursive discovery traverses large directories like node_modules, dist, build, etc., which can severely impact performance on large workspaces. Consider skipping common heavy directories during traversal.
| const entries = await fs.readdir(dirPath, { withFileTypes: true }) | |
| const subdirs = entries | |
| .filter((entry) => entry.isDirectory() && !entry.name.startsWith(".")) | |
| .map((entry) => path.join(dirPath, entry.name)) | |
| const entries = await fs.readdir(dirPath, { withFileTypes: true }) | |
| const SKIP_DIRS = new Set(["node_modules", "dist", "build", "out", "coverage", ".git"]) | |
| const subdirs = entries | |
| .filter((entry) => entry.isDirectory() && !entry.name.startsWith(".") && !SKIP_DIRS.has(entry.name)) | |
| .map((entry) => path.join(dirPath, entry.name)) |
|
Thanks for the tag — acknowledged. I pulled the PR context and I am auditing test coverage now. From an initial scan, we have unit tests for GitIgnoreController and RooIgnoreController, but I do not see an integration test that exercises the search_files tool end to end to prove nested .gitignore exclusions. I will add targeted tests that: 1) set up nested .gitignore with ignore/negation patterns, 2) run search_files from repo root, and 3) assert ignored paths are excluded across platforms. I will push updates on this branch shortly. |
…ested .gitignore filtering and mock fs/promises
|
Added end-to-end test coverage for nested .gitignore filtering in the search_files path.\n\nWhat I added:\n- Test: src/services/ripgrep/tests/regex-search-files.gitignore.integration.spec.ts\n- It simulates ripgrep JSON output and initializes a real GitIgnoreController with a nested src/.gitignore that ignores ".tmp" and re-allows "!keep.tmp".\n\nCurrent result:\n- The new test fails: regexSearchFiles still includes a match in src/ignored.tmp. This demonstrates a gap in end-to-end filtering even though unit tests for GitIgnoreController/RooIgnoreController exist.\n\nNotes:\n- The test mocks fs/promises, child_process.spawn, and VSCode env, and forces getBinPath to succeed. It asserts controller behavior explicitly before calling regexSearchFiles.\n- Branch updated: fix/issue-7921-search_files-includes-files-ignored-by-nested-gitignore (pushed).\n\nPlan:\n- Audit GitIgnoreController pattern normalization/ordering and BaseIgnoreController path handling to ensure "src/.tmp" and "src/**/*.tmp" patterns properly exclude nested files, with negations applied in correct precedence.\n- Expand tests for escaped literals (\# and \!) and negations in nested .gitignore.\n- Ensure regexSearchFiles consistently applies both controllers before formatting results.\n\nI will push a fix that makes this test green next. |
…ludes-files-ignored-by-nested-gitignore test(ripgrep): add integration test for regexSearchFiles to enforce n…
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found some issues that need attention. See inline comments for concrete fixes.
| : results | ||
| // Filter results using both controllers if provided | ||
| const filteredResults = results | ||
| .filter((result) => gitIgnoreController?.validateAccess(result.file) ?? true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P1: Potential path base mismatch. validateAccess() resolves against cwd, but ripgrep paths are relative to the search root. When directoryPath is a subfolder, results like ignored.tmp (relative to directoryPath) will be resolved as ${cwd}/ignored.tmp instead of ${directoryPath}/ignored.tmp. Consider resolving parsed.data.path.text against directoryPath when constructing each file result so ignore checks are performed on absolute paths anchored to the actual search root.
| // Get all subdirectories | ||
| const entries = await fs.readdir(dirPath, { withFileTypes: true }) | ||
| const subdirs = entries | ||
| .filter((entry) => entry.isDirectory() && !entry.name.startsWith(".")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P2: Hidden directories are skipped during discovery. This prevents picking up .gitignore files in hidden folders (e.g. .github/.gitignore, .vscode/.gitignore). Include hidden directories and let ignore patterns control exclusions.
| .filter((entry) => entry.isDirectory() && !entry.name.startsWith(".")) | |
| const subdirs = entries | |
| .filter((entry) => entry.isDirectory()) | |
| .map((entry) => path.join(dirPath, entry.name)) |
| // Get all subdirectories | ||
| const entries = await fs.readdir(dirPath, { withFileTypes: true }) | ||
| const subdirs = entries | ||
| .filter((entry) => entry.isDirectory() && !entry.name.startsWith(".")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P2: Same discovery issue in the recursive branch: skipping dot-directories can miss nested .gitignore files under hidden folders.
| .filter((entry) => entry.isDirectory() && !entry.name.startsWith(".")) | |
| const subdirs = entries | |
| .filter((entry) => entry.isDirectory()) | |
| .map((entry) => path.join(dirPath, entry.name)) |
|
|
||
| if (relativeDir) { | ||
| // For nested .gitignore files, we need to create patterns that match files within that directory | ||
| const lines = content.split(/\r?\n/).filter((line) => line.trim() && !line.startsWith("#")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P3: Comment detection uses !line.startsWith('#') on the untrimmed line. Lines with leading spaces before # become patterns, and escaped \# should be treated as a literal #. Trim leading whitespace and only treat as a comment if the first non-space char is an unescaped #.
| const lines = content.split(/\r?\n/).filter((line) => line.trim() && !line.startsWith("#")) | |
| const lines = content.split(/\r?\n/).filter((line) => { | |
| const s = line.trimStart(); | |
| return s && !(s.startsWith('#') && !s.startsWith('\\#')); | |
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found some issues that need attention. See inline comments.
| } | ||
|
|
||
| // Convert real path to relative for ignore checking | ||
| const relativePath = path.relative(this.cwd, realPath).toPosix() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[P0] String.prototype.toPosix() does not exist and will throw at runtime. Normalize separators explicitly before passing to the ignore matcher.
| const relativePath = path.relative(this.cwd, realPath).toPosix() | |
| const relativePath = path.relative(this.cwd, realPath).split(path.sep).join("/") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we tested this and it's already in prod in the rooIgnoreController line 108
const relativePath = path.relative(this.cwd, realPath).toPosix()
|
|
||
| if (relativeDir) { | ||
| // For nested .gitignore files, we need to create patterns that match files within that directory | ||
| const lines = content.split(/\r?\n/).filter((line) => line.trim() && !line.startsWith("#")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[P1] Escaped '#' patterns (\#) are valid gitignore entries (literal '#') and should not be dropped as comments. The current filter removes them.
| const lines = content.split(/\r?\n/).filter((line) => line.trim() && !line.startsWith("#")) | |
| const lines = content | |
| .split(/\r?\n/) | |
| .map((l) => l.trim()) | |
| .filter((l) => l && !(l.startsWith("#") && !l.startsWith("\\#"))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is test coverage for this and they all pass.
|
I just tested as well and this works. Did not review code, @daniel-lxs will soon |
|
Thank you for the detailed work on this PR! After investigation, we found that the root cause is simpler than expected: the We've implemented a minimal fix that makes the glob flag conditional - only adding it when a specific file pattern is provided. This leverages ripgrep's native .gitignore support. Closing this PR in favor of the simpler solution. Appreciate the thorough testing and documentation you provided! |
|
Closing in favor of simpler solution - see comment above. |
Note: I added a debounce to the filewatcher to improve performance in the shared class. This meant adding a 150ms wait to the "should reset when .rooignore is deleted" test in RooIgnoreController.spec.ts I can revert these changes if desired.
Related GitHub Issue
Closes: #7921
Roo Code Task Context (Optional)
Description
Implement GitIgnoreController for Nested .gitignore Support
Problem
The
search_filestool was returning results from directories/files that nested.gitignorefiles mark as ignored. This occurred when runningsearch_filesfrom the repository root in projects with.gitignorefiles inside subdirectories, causing noise and increased output size.Expected behavior: Matches from ignored paths should be excluded by default.
Actual behavior: Results included matches from ignored subdirectories.
Solution Overview
Implemented a complete
GitIgnoreControllersystem that respects nested.gitignorefiles throughout the directory tree, unlike ripgrep which only honors top-level.gitignorefiles.Architecture
1. BaseIgnoreController (Abstract Base Class)
File:
src/core/ignore/BaseIgnoreController.tsProvides common functionality for all ignore controllers:
fs.realpathSync()Key Methods:
validateAccess(filePath): Check if file should be accessible (not ignored)filterPaths(paths): Filter array of paths, removing ignored onessetupFileWatcher(): Set up VSCode file watchers with debouncingdispose(): Clean up all resources2. GitIgnoreController (Concrete Implementation)
File:
src/core/ignore/GitIgnoreController.tsHandles multiple
.gitignorefiles throughout the directory tree:.gitignorefiles from common locations[".gitignore", "src/.gitignore", "lib/.gitignore", "test/.gitignore", "tests/.gitignore"].gitignorecontent in memory for performance.gitignorefiles changeKey Features:
BaseIgnoreControllerfor consistent behaviorignorelibrary for standard.gitignoresyntax supportgetGitignoreFiles()andgetGitignoreContent()for introspection3. Enhanced RooIgnoreController
File:
src/core/ignore/RooIgnoreController.ts(existing, enhanced)Now extends
BaseIgnoreControllerfor consistency:.rooignorefilesFiles Created/Modified
New Files:
src/core/ignore/BaseIgnoreController.ts: Abstract base class with common functionalitysrc/core/ignore/GitIgnoreController.ts: Multi-file .gitignore controllersrc/core/ignore/__tests__/GitIgnoreController.spec.ts: Comprehensive unit tests (10 tests)src/core/ignore/__tests__/GitIgnoreController.security.spec.ts: Security-focused tests (9 tests)Modified Files:
src/core/ignore/RooIgnoreController.ts: Refactored to extend BaseIgnoreControllersrc/core/ignore/__tests__/RooIgnoreController.spec.ts: Updated for new architectureTest Procedure
Comprehensive test suite with 52 total tests across 4 files:
GitIgnoreController Tests (19 tests):
RooIgnoreController Tests (33 tests):
Integration Points
Search Files Tool
The
GitIgnoreControllerintegrates with the search system to:File System Operations
All file manipulation services can use the controller to:
Security Considerations
Path Traversal Protection
Fail-Closed Behavior
Performance Optimizations
Caching Strategy
Efficient Discovery
Backward Compatibility
Pre-Submission Checklist
Screenshots / Videos
2025-10-01.13-48-08_1.mov
Documentation Updates
Additional Notes
n/a
Get in Touch
jevons
Important
Introduces
GitIgnoreControllerto handle nested.gitignorefiles, ensuring ignored files are excluded from search results, and updates related components and tests.GitIgnoreControllerto handle nested.gitignorefiles, ensuring ignored files are excluded from search results.searchFilesToolto useGitIgnoreControlleralongsideRooIgnoreController.GitIgnoreControllerinGitIgnoreController.tshandles multiple.gitignorefiles, supports pattern merging, and caches content.BaseIgnoreControllerprovides common functionality for ignore controllers, including file validation and path filtering.RooIgnoreControllernow extendsBaseIgnoreControllerfor consistency.GitIgnoreController.spec.tsandGitIgnoreController.security.spec.tswith 19 tests for initialization, file validation, path filtering, and security.RooIgnoreController.spec.tsto reflect architectural changes.Task.tsto initialize and dispose ofGitIgnoreController.regexSearchFilesinripgrep/index.tsto filter results using bothGitIgnoreControllerandRooIgnoreController.This description was created by
for 8dcddb4. You can customize this summary. It will automatically update as commits are pushed.