-
Notifications
You must be signed in to change notification settings - Fork 2.6k
fix: enhance null edit detection for Gemini models #7362
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,134 @@ | ||
| import { MultiSearchReplaceDiffStrategy } from "../multi-search-replace" | ||
|
|
||
| describe("null edit detection", () => { | ||
| let strategy: MultiSearchReplaceDiffStrategy | ||
|
|
||
| beforeEach(() => { | ||
| strategy = new MultiSearchReplaceDiffStrategy() | ||
| }) | ||
|
|
||
| it("should detect and reject null edits (identical search and replace)", async () => { | ||
| const originalContent = 'function hello() {\n console.log("hello")\n}\n' | ||
| const diffContent = [ | ||
| "<<<<<<< SEARCH", | ||
| "function hello() {", | ||
| ' console.log("hello")', | ||
| "}", | ||
| "=======", | ||
| "function hello() {", | ||
| ' console.log("hello")', | ||
| "}", | ||
| ">>>>>>> REPLACE", | ||
| ].join("\n") | ||
|
|
||
| const result = await strategy.applyDiff(originalContent, diffContent) | ||
| expect(result.success).toBe(false) | ||
| if (!result.success && result.failParts && result.failParts.length > 0) { | ||
| const firstError = result.failParts[0] | ||
| if (!firstError.success && firstError.error) { | ||
| expect(firstError.error).toContain("NULL EDIT DETECTED") | ||
| expect(firstError.error).toContain("Search and replace content are identical") | ||
| expect(firstError.error).toContain("This is a common issue with AI models (especially Gemini)") | ||
| expect(firstError.error).toContain("Model hallucination") | ||
| } | ||
| } | ||
| }) | ||
|
|
||
| it("should detect null edits in multi-block diffs", async () => { | ||
| const originalContent = | ||
| 'function hello() {\n console.log("hello")\n}\nfunction world() {\n return "world"\n}' | ||
| const diffContent = [ | ||
| "<<<<<<< SEARCH", | ||
| "function hello() {", | ||
| ' console.log("hello")', | ||
| "}", | ||
| "=======", | ||
| "function hello() {", | ||
| ' console.log("hello world")', | ||
| "}", | ||
| ">>>>>>> REPLACE", | ||
| "", | ||
| "<<<<<<< SEARCH", | ||
| "function world() {", | ||
| ' return "world"', | ||
| "}", | ||
| "=======", | ||
| "function world() {", | ||
| ' return "world"', | ||
| "}", | ||
| ">>>>>>> REPLACE", | ||
| ].join("\n") | ||
|
|
||
| const result = await strategy.applyDiff(originalContent, diffContent) | ||
| // Should partially succeed but report the null edit | ||
| expect(result.success).toBe(true) | ||
| if (result.success) { | ||
| expect(result.failParts).toBeDefined() | ||
| if (result.failParts && result.failParts[0] && !result.failParts[0].success) { | ||
| expect(result.failParts[0].error).toContain("NULL EDIT DETECTED") | ||
| } | ||
| } | ||
| }) | ||
|
|
||
| it("should detect null edits with line numbers", async () => { | ||
| const originalContent = "function test() {\n return true;\n}\n" | ||
| const diffContent = [ | ||
| "<<<<<<< SEARCH", | ||
| ":start_line:1", | ||
| "-------", | ||
| "function test() {", | ||
| " return true;", | ||
| "}", | ||
| "=======", | ||
| "function test() {", | ||
| " return true;", | ||
| "}", | ||
| ">>>>>>> REPLACE", | ||
| ].join("\n") | ||
|
|
||
| const result = await strategy.applyDiff(originalContent, diffContent) | ||
| expect(result.success).toBe(false) | ||
| if (!result.success && result.failParts && result.failParts.length > 0) { | ||
| const firstError = result.failParts[0] | ||
| if (!firstError.success && firstError.error) { | ||
| expect(firstError.error).toContain("NULL EDIT DETECTED") | ||
| expect(firstError.error).toContain("phantom") | ||
| expect(firstError.error).toContain("Ensure the REPLACE block contains the actual modified content") | ||
| } | ||
| } | ||
| }) | ||
|
|
||
| it("should not trigger null edit detection for legitimate empty replacements (deletions)", async () => { | ||
| const originalContent = "function test() {\n // Remove this comment\n return true;\n}\n" | ||
| const diffContent = ["<<<<<<< SEARCH", " // Remove this comment", "=======", ">>>>>>> REPLACE"].join("\n") | ||
|
|
||
| const result = await strategy.applyDiff(originalContent, diffContent) | ||
| expect(result.success).toBe(true) | ||
| // Should not contain null edit error | ||
| if (!result.success && result.error) { | ||
| expect(result.error).not.toContain("NULL EDIT DETECTED") | ||
| } | ||
| }) | ||
|
|
||
| it("should not trigger for actual changes even if similar", async () => { | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could we add an edge case test for when search and replace blocks have different whitespace but are otherwise identical? This would help document the expected behavior when whitespace types differ (tabs vs spaces). |
||
| const originalContent = "function test() {\n return true;\n}\n" | ||
| const diffContent = [ | ||
| "<<<<<<< SEARCH", | ||
| "function test() {", | ||
| " return true;", | ||
| "}", | ||
| "=======", | ||
| "function test() {", | ||
| " return false;", | ||
| "}", | ||
| ">>>>>>> REPLACE", | ||
| ].join("\n") | ||
|
|
||
| const result = await strategy.applyDiff(originalContent, diffContent) | ||
| expect(result.success).toBe(true) | ||
| // Should not contain null edit error | ||
| if (!result.success && result.error) { | ||
| expect(result.error).not.toContain("NULL EDIT DETECTED") | ||
| } | ||
| }) | ||
| }) | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -432,10 +432,17 @@ Only use a single line of '=======' between search and replacement content, beca | |
| diffResults.push({ | ||
| success: false, | ||
| error: | ||
| `Search and replace content are identical - no changes would be made\n\n` + | ||
| `NULL EDIT DETECTED: Search and replace content are identical - no changes would be made\n\n` + | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Consider extracting this error message to a shared constant since it's duplicated in multi-file-search-replace.ts (lines 518-528). This would ensure consistency and make future updates easier. Also, while the error message is comprehensive, we might want to add a test case for when search and replace blocks differ only in whitespace type (tabs vs spaces) to ensure we're not triggering false positives. |
||
| `This is a common issue with AI models (especially Gemini) producing "phantom" edits.\n\n` + | ||
| `Debug Info:\n` + | ||
| `- Search and replace must be different to make changes\n` + | ||
| `- Use read_file to verify the content you want to change`, | ||
| `- The SEARCH block and REPLACE block contain exactly the same content\n` + | ||
| `- This means the edit would have zero effect on the file\n` + | ||
| `- To fix: Ensure the REPLACE block contains the actual modified content\n` + | ||
| `- Use read_file to verify the current file content before making changes\n\n` + | ||
| `Common causes:\n` + | ||
| `1. Model hallucination - thinking it made changes when it didn't\n` + | ||
| `2. Incorrect copy/paste of content between blocks\n` + | ||
| `3. Missing the actual modification in the REPLACE block`, | ||
| }) | ||
| continue | ||
| } | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great test coverage! Consider adding a comment here explaining why legitimate deletions (empty replace blocks) should not trigger null edit detection. This distinction is important for future maintainers to understand the design decision.