-
Notifications
You must be signed in to change notification settings - Fork 2.5k
fix: improve diff parsing for Grok models with malformed content #7758
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Add detection for Grok-specific malformed diff patterns - Provide better error messages when AI models generate incorrect syntax - Add debugging information to help users understand what went wrong - Include actionable suggestions for fixing diff issues - Add comprehensive test coverage for Grok scenarios This addresses issue #7750 where Grok models were generating malformed diffs that caused confusing error messages about markers that were not actually present in the files being edited.
| } | ||
|
|
||
| private validateMarkerSequencing(diffContent: string): { success: boolean; error?: string } { | ||
| private validateMarkerSequencing(diffContent: string): { success: boolean; error?: string; debugInfo?: string } { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both strategies (MultiFileSearchReplaceDiffStrategy and MultiSearchReplaceDiffStrategy) now include nearly identical diff validation methods. Consider extracting the common logic (detectGrokMalformedDiff, analyzeDiffStructure, and parts of validateMarkerSequencing) into a shared utility module to reduce duplication.
This comment was generated because it violated a code review rule: irule_tTqpIuNs8DV0QFGj.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewing my own code is like debugging in a mirror - everything looks backwards but the bugs are still mine.
| /** | ||
| * Detects if the diff content appears to be malformed in a way typical of Grok models | ||
| */ | ||
| private detectGrokMalformedDiff(diffContent: string): boolean { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add edge case handling here? The method might benefit from checking if diffContent is null/undefined or contains only whitespace before processing.
| * Analyzes the diff structure to provide debugging information | ||
| */ | ||
| private analyzeDiffStructure(diffContent: string): string { | ||
| const lines = diffContent.split("\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Performance consideration: These iterations through the lines array (412-415) could be combined with the earlier loop for better performance on large diffs. Would a single-pass approach work here?
| ) | ||
| } | ||
|
|
||
| /** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation opportunity: These new helper methods would benefit from JSDoc comments explaining their purpose, parameters, and return values. Future maintainers (including future me) would appreciate it!
| `4. If the file contains special characters or markers, they may need escaping\n\n` + | ||
| `CORRECT FORMAT:\n` + | ||
| `<<<<<<< SEARCH\n` + | ||
| `:start_line: (optional) The line number where search starts\n` + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this intentional? The error message shows ':start_line:' as '(optional)' here but '(required)' at line 109. Should we standardize this for consistency?
| consecutiveSeparators > 0 || | ||
| separatorBeforeSearch || | ||
| lines.filter((l) => l.trim() === "=======").length > | ||
| lines.filter((l) => l.trim().startsWith("<<<<<<< SEARCH")).length * 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Magic number alert: Consider extracting '* 2' to a named constant like MAX_SEPARATORS_PER_SEARCH_RATIO for better code clarity.
| import { MultiSearchReplaceDiffStrategy } from "../multi-search-replace" | ||
| import { MultiFileSearchReplaceDiffStrategy } from "../multi-file-search-replace" | ||
|
|
||
| describe("Grok Malformed Diff Detection", () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great test coverage! Consider adding edge case tests for:
- Empty or whitespace-only diff content
- Very large diffs (performance testing)
- Mixed valid and invalid blocks in the same diff
|
Closing, see #7750 (comment) |
Summary
This PR addresses Issue #7750 where Grok Coder models were experiencing parsing errors with the apply_diff tool, specifically when the error messages incorrectly indicated that files contained diff markers (like
=======) when they actually did not.Problem
As reported by @mrbm, the system was showing error messages about escaping special markers like
=======even when these markers were not present in the actual file content. This suggests the issue is with how the diff parser handles malformed content generated by AI models, not with the file content itself.Solution
This PR improves the diff parsing logic to:
Detect Grok-specific malformed patterns - Identifies common issues like:
=======appearing multiple times)Provide better error messages - When malformed diffs are detected:
Add comprehensive test coverage - 27 new test cases covering various Grok-specific scenarios
Changes
multi-search-replace.tsandmulti-file-search-replace.tsdetectGrokMalformedDiff()andanalyzeDiffStructure()grok-malformed-diff.spec.tsTesting
Note
While this implementation should help with many Grok-related diff issues, I have asked @mrbm for additional details about their specific case to ensure this fully resolves the issue. The fix focuses on better detection and messaging for malformed diffs, which should prevent the confusing error messages about markers that do not exist in files.
Fixes #7750
Important
Enhances diff parsing to detect Grok-specific malformed patterns, improves error messaging, and adds comprehensive test coverage.
multi-search-replace.tsandmulti-file-search-replace.tsto detect Grok-specific malformed patterns like consecutive separators, separators before SEARCH markers, unbalanced markers, and excessive separators.detectGrokMalformedDiff()andanalyzeDiffStructure()to identify and analyze malformed diffs.grok-malformed-diff.spec.tswith 27 new test cases for Grok-specific scenarios.multi-search-replace.spec.tsto match new error message format.validateMarkerSequencing()to incorporate Grok detection.This description was created by
for d509c20. You can customize this summary. It will automatically update as commits are pushed.