Skip to content

Conversation

@roomote
Copy link
Contributor

@roomote roomote bot commented Aug 9, 2025

Fixes #6881

Problem

The codebase indexing feature was failing with an "Invalid regular expression" error when encountering malformed patterns in .gitignore files. Specifically, patterns like pqh[A-/] where the character range [A-/] is invalid (since / comes before A in ASCII) would cause the entire indexing process to fail.

Solution

This PR adds robust error handling for invalid gitignore patterns:

  1. Graceful degradation: When the ignore library fails to parse the entire .gitignore file, we now parse it line by line
  2. Individual pattern validation: Each pattern is tested individually, and invalid patterns are skipped with a warning
  3. Continued operation: The indexing process continues even when some patterns are invalid
  4. Clear logging: Invalid patterns are logged as warnings for debugging

Changes

  • Added try-catch blocks around ignore pattern parsing in src/services/code-index/manager.ts
  • Implemented line-by-line parsing fallback when bulk parsing fails
  • Added comprehensive test coverage for various gitignore scenarios

Testing

Added 4 new test cases covering:

  • Invalid gitignore patterns
  • Valid gitignore patterns
  • Missing .gitignore files
  • Mixed valid and invalid patterns

All tests pass successfully.


Important

Adds error handling for invalid .gitignore patterns in manager.ts, allowing indexing to continue by skipping invalid patterns and logging warnings.

  • Behavior:
    • Adds error handling for invalid .gitignore patterns in manager.ts.
    • Skips invalid patterns and logs warnings, allowing indexing to continue.
    • Handles missing .gitignore files gracefully.
  • Testing:
    • Adds tests in manager.spec.ts for invalid, valid, and mixed gitignore patterns.
    • Tests for missing .gitignore files.
  • Implementation:
    • Adds try-catch around pattern parsing in _recreateServices() in manager.ts.
    • Implements line-by-line parsing fallback for .gitignore patterns.

This description was created by Ellipsis for 45b0b36. You can customize this summary. It will automatically update as commits are pushed.

…exing

- Add try-catch blocks to handle invalid regex patterns in .gitignore
- Parse .gitignore line by line when bulk parsing fails
- Skip individual invalid patterns while processing valid ones
- Add comprehensive tests for various gitignore scenarios
- Fixes #6881
@roomote roomote bot requested review from cte, jr and mrubens as code owners August 9, 2025 18:24
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. bug Something isn't working labels Aug 9, 2025
Copy link
Contributor Author

@roomote roomote bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewing my own code is like debugging in a mirror - everything looks backwards but somehow still broken.


try {
// Create a new ignore instance to test each pattern
const testIgnore = ignore()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating a new ignore instance for each pattern validation could be inefficient for large .gitignore files. Could we consider reusing a single test instance or implementing a more efficient validation approach?

} catch (ignoreError) {
// Log warning about invalid patterns but continue with indexing
console.warn(
`Warning: .gitignore contains invalid patterns that could not be parsed. Some files may not be properly ignored during indexing. Error: ${
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The warning message could be more actionable. Is it intentional that we don't parse out which specific pattern caused the initial bulk parsing to fail? It might help users fix their .gitignore file more easily if we could identify the problematic pattern in the first error message.

)

// Try to add individual lines to identify and skip problematic patterns
const lines = content.split("\n")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be cleaner to extract this line-by-line parsing logic into a separate helper method? Something like:

This would improve readability and make the method easier to test in isolation.

location: "_recreateServices",
})
// File reading error - .gitignore might not exist or be inaccessible
console.info(".gitignore file not found or could not be read, proceeding without gitignore patterns")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor inconsistency: we're using here but for warnings above. Should we consider using a consistent logging approach or perhaps the VSCode output channel for better user visibility?

})
})

describe("gitignore pattern handling", () => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test coverage is comprehensive! Could we also add an edge case test for .gitignore files with extremely long lines or unusual encoding? This would ensure our solution is robust against all possible user inputs.

@hannesrudolph hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Aug 9, 2025
@daniel-lxs daniel-lxs closed this Aug 12, 2025
@github-project-automation github-project-automation bot moved this from New to Done in Roo Code Roadmap Aug 12, 2025
@github-project-automation github-project-automation bot moved this from Triage to Done in Roo Code Roadmap Aug 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. size:L This PR changes 100-499 lines, ignoring generated files.

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

Codebase Indexing error: Invalid regular expression

4 participants