-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Fix: Improve HTML entity unescaping for Gemini and other models #7891
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Add HTML entity unescaping to applyDiffTool for non-Claude models - Enhance unescapeHtmlEntities function to handle more entity types - Add support for alternative encodings like ', /, \, ` - Add comprehensive tests for new entity types - Improve comments to clarify the purpose of unescaping Fixes #7890
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewing my own code is like debugging in a mirror - everything looks backwards but the bugs are still mine.
| @@ -0,0 +1,57 @@ | |||
| import { describe, it, expect } from "vitest" | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we consider merging this test file with the existing text-normalization.spec.ts? Both test the same function, and having them in one file would improve maintainability and discoverability.
| it("handles complex mixed content with all entity types", () => { | ||
| const input = | ||
| "<div class="test">It's a test/path\file with `code` & more</div>" | ||
| const expected = '<div class="test">It\'s a test/path\\file with `code` & more</div>' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this double space intentional? The input has test (one non-breaking space) but the expected output shows two spaces between "It's a" and "test". Should we verify this is the intended behavior?
| .replace(/\/g, "\\") // Backslash | ||
| .replace(/`/g, "`") // Backtick | ||
| .replace(/ /g, " ") // Non-breaking space | ||
| .replace(/&/g, "&") // Must be last to avoid double-unescaping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For high-frequency usage, would it be worth considering a single regex with a replacement function instead of chaining multiple .replace() calls? This could improve performance for frequently called functions.
| const relPath: string | undefined = block.params.path | ||
| let diffContent: string | undefined = block.params.diff | ||
|
|
||
| // Unescape HTML entities for non-Claude models (e.g., Gemini, DeepSeek, Llama) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor suggestion: Could we make this comment more generic? Instead of listing specific models, perhaps: "Unescape HTML entities for non-Claude models that may return content with escaped characters"
| } | ||
|
|
||
| // Unescape HTML entities for non-Claude models (e.g., Gemini, DeepSeek, Llama) | ||
| // These models may return content with escaped characters that need to be unescaped |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment consistency suggestion here - could be more generic rather than listing specific model names.
Summary
This PR fixes issue #7890 where Gemini Flash and other non-Claude models were failing to write files or apply diffs due to HTML-escaped content in their responses.
Problem
Gemini and other models return content with HTML-escaped characters (e.g.,
<,>,', etc.) which were not being properly unescaped before file operations, causing:Solution
Extended HTML entity unescaping: Added support for additional HTML entities commonly used by Gemini:
'(alternative apostrophe encoding)/(forward slash)\(backslash)`(backtick) (non-breaking space)Applied fix to both tools:
applyDiffTool.ts: Now unescapes diff content for non-Claude modelswriteToFileTool.ts: Already had unescaping, improved comments for clarityComprehensive testing: Added 8 new test cases to verify all entity types are properly handled
Changes
src/utils/text-normalization.tsto handle more HTML entitiessrc/core/tools/applyDiffTool.tsto unescape content for non-Claude modelssrc/core/tools/writeToFileTool.tssrc/utils/__tests__/text-normalization-extended.spec.tsTesting
Impact
This fix restores full functionality for Gemini Flash and other affected models, allowing them to:
Fixes #7890
Important
Enhanced HTML entity unescaping for non-Claude models in
applyDiffTool.tsandwriteToFileTool.ts, with comprehensive tests added.unescapeHtmlEntities()intext-normalization.tsto support additional entities like',/,\,`, and .applyDiffTool.tsnow unescapes diff content for non-Claude models.writeToFileTool.tsalready had unescaping; improved comments for clarity.text-normalization-extended.spec.tsto verify handling of all entity types.writeToFileTool.tsfor better understanding of pre-processing logic.This description was created by
for 6477cad. You can customize this summary. It will automatically update as commits are pushed.