-
Notifications
You must be signed in to change notification settings - Fork 2.6k
feat: improve gitignore pattern handling with proper glob support #6884
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Add dedicated gitignore-parser utility that understands gitignore glob syntax - Document differences between gitignore patterns and regex patterns - Handle invalid character ranges like [A-/] by transforming them to match git behavior - Add comprehensive tests for edge cases including the specific pqh[A-/] pattern - Update CodeIndexManager to use the new parser for better compatibility This addresses the insights from issue #6881 about how gitignore patterns differ from regex patterns and provides a more robust solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote this code 5 minutes ago and already found 6 things I should have done differently.
|
|
||
| // Handle reverse ranges like [Z-A] | ||
| // We need to escape the problematic ranges in our own regex! | ||
| const reverseRangeRegex = /\[([^[\]]*[ZYXWVUTSRQzyxwvutsrq]-[ABCDEFGHIJKLMNOPabcdefghijklmnop][^[\]]*)\]/g |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this regex complexity intentional? The hardcoded character sets in these patterns make my eyes hurt even though I just wrote them. Could we consider extracting these into named constants or a configuration object for better maintainability?
| const lines = content.split("\n") | ||
| let hasTransformations = false | ||
|
|
||
| for (const line of lines) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Performance question: We're checking all lines for transformations even after successful bulk parsing. Wouldn't it be more efficient to only do this transformation check if bulk parsing actually fails? We could save some iterations here.
| const result = sanitizeGitignorePattern("test[A-/]and[Z-B]") | ||
| expect(result).not.toBeNull() | ||
| // First transformation should handle [A-/] | ||
| expect(result?.transformed).toContain("testA") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test only verifies the first transformation but the pattern has two invalid ranges. Should we add an assertion to verify both [A-/] and [Z-B] are transformed correctly?
| * | ||
| * Key differences from regex: | ||
| * 1. Character classes like [A-/] are valid in gitignore but invalid in regex | ||
| * - In gitignore, invalid ranges are treated as literal characters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be helpful to add some concrete examples here? Future maintainers might appreciate seeing actual patterns like:
| // Log telemetry if there were any issues parsing patterns | ||
| if (parseResult && (parseResult.invalidPatterns.length > 0 || parseResult.transformedPatterns.length > 0)) { | ||
| // Use CODE_INDEX_ERROR with a warning level indicator since CODE_INDEX_WARNING doesn't exist | ||
| TelemetryService.instance.captureEvent(TelemetryEventName.CODE_INDEX_ERROR, { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor inconsistency: We're using CODE_INDEX_ERROR event type but setting level: "warning". Would it be clearer to either introduce a CODE_INDEX_WARNING event type or rename the level field to severity?
| if (!parseResult.validPatterns.includes(".gitignore")) { | ||
| parseResult.validPatterns.push(".gitignore") | ||
| } | ||
| } catch { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Silent failure here - should we at least log this for debugging? Even though it's unlikely to fail, having a trace could help diagnose issues in production.
Related to #6881
Context
Following the insightful comment by @dsent in #6881, this PR implements a more robust approach to handling .gitignore patterns that acknowledges the fundamental differences between gitignore glob syntax and regular expressions.
Problem
The
ignorelibrary we use internally converts gitignore patterns to regex, but as @dsent correctly pointed out:pqh[A-/]are valid in git but invalid in regex[A-/]becomes justA)Solution
This PR introduces a dedicated
gitignore-parserutility that:[A-/]are transformed to their git interpretation (A)[Z-A]are handled similarlyChanges
src/services/code-index/utils/gitignore-parser.tswith comprehensive documentationsanitizeGitignorePattern()function to transform problematic patternsparseGitignoreContent()function for robust pattern parsingCodeIndexManagerto use the new parserTesting
pqh[A-/]pattern from the issueNotes
While this doesn't implement a full 500+ line gitignore-to-regex translator as @dsent mentioned, it provides a pragmatic solution that:
This builds upon PR #6882 but takes a more comprehensive approach by understanding and documenting the actual differences between gitignore and regex patterns.
Important
Introduces
gitignore-parser.tsto handle .gitignore patterns with glob syntax, transforming invalid patterns and integrating intoCodeIndexManager.gitignore-parser.tsto handle .gitignore patterns with glob syntax, not regex.[A-/]toA) and reverse ranges (e.g.,[Z-A]toZ).CodeIndexManagerto usecreateIgnoreInstanceFromFile()for parsing .gitignore.manager.ts.gitignore-parser.spec.tswith 16 tests covering edge cases and transformations.gitignore-parser.tson differences between gitignore and regex.This description was created by
for c6f166c. You can customize this summary. It will automatically update as commits are pushed.