-
Notifications
You must be signed in to change notification settings - Fork 2.8k
feat: UI-safe request formatting to avoid embedding file contents #8697
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…messages; add formatContentBlockForUi() and use for api_req_started
Review SummaryI've identified 3 issues that need to be addressed:
|
| if (!text) return "" | ||
|
|
||
| // If this looks like a files XML, summarize paths/errors/notices and drop content bodies | ||
| if (text.includes("<files") || text.includes("<files")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicate condition found in sanitizeText: checking text.includes('<files') twice. Remove the redundant check.
| if (text.includes("<files") || text.includes("<files")) { | |
| if (text.includes("<files")) { |
|
|
||
| function looksLikeDiff(s: string): boolean { | ||
| return ( | ||
| s.includes("<<<<<<< SEARCH") || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redundant diff marker checks in looksLikeDiff: the same strings are checked twice. Consider removing the duplicate includes for '<<<<<<< SEARCH' and '>>>>>>> REPLACE'.
| */ | ||
| function summarizeFilesXml(xmlLike: string): string { | ||
| // Support both escaped and unescaped tags | ||
| const decode = (s: string) => s.replace(/</g, "<").replace(/>/g, ">").replace(/&/g, "&") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The decode function does not actually decode HTML entities. It replaces '<' with '<', etc. Consider using proper replacement (e.g. replacing '<' with '<', '>' with '>', '&' with '&').
| const decode = (s: string) => s.replace(/</g, "<").replace(/>/g, ">").replace(/&/g, "&") | |
| const decode = (s: string) => s.replace(/</g, '<').replace(/>/g, '>').replace(/&/g, '&') |
| */ | ||
| function summarizeFilesXml(xmlLike: string): string { | ||
| // Support both escaped and unescaped tags | ||
| const decode = (s: string) => s.replace(/</g, "<").replace(/>/g, ">").replace(/&/g, "&") |
Check warning
Code scanning / CodeQL
Replacement of a substring with itself Medium
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 3 months ago
To resolve this, we should properly decode XML/HTML entities in the string: replace < with <, > with >, and & with & (and possibly more, but these cover the standard cases given the context of parsing XML-like content). This can be done by chaining .replace() calls, or (for broader correctness) by using a well-known library such as he for HTML entity decoding. Since only standard entities are mentioned and only the shown code can be changed, the best fix is to replace the identity replacements with proper entity replacements in the decode helper function, on line 126.
-
Copy modified lines R126-R130
| @@ -123,7 +123,11 @@ | ||
| */ | ||
| function summarizeFilesXml(xmlLike: string): string { | ||
| // Support both escaped and unescaped tags | ||
| const decode = (s: string) => s.replace(/</g, "<").replace(/>/g, ">").replace(/&/g, "&") | ||
| const decode = (s: string) => | ||
| s | ||
| .replace(/</g, "<") | ||
| .replace(/>/g, ">") | ||
| .replace(/&/g, "&"); | ||
|
|
||
| const raw = decode(xmlLike) | ||
| const fileRegex = /<file>([\s\S]*?)<\/file>/g |
| */ | ||
| function summarizeFilesXml(xmlLike: string): string { | ||
| // Support both escaped and unescaped tags | ||
| const decode = (s: string) => s.replace(/</g, "<").replace(/>/g, ">").replace(/&/g, "&") |
Check warning
Code scanning / CodeQL
Replacement of a substring with itself Medium
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 3 months ago
To fix this problem, update the decode function (on line 126) to properly decode common XML entities. Specifically, replace all occurrences of < with <, > with >, and & with &. This ensures that both escaped and unescaped XML tags are handled as described in the comment, and the decoder correctly converts XML entity-encoded tags to their character equivalents for further processing. The changes are localized to the decode function, so edit line 126 in the file src/shared/formatContentBlockForUi.ts to swap the current replacement calls for the correct ones.
No additional imports or dependencies are required, and the fix is self-contained.
-
Copy modified lines R126-R129
| @@ -123,7 +123,10 @@ | ||
| */ | ||
| function summarizeFilesXml(xmlLike: string): string { | ||
| // Support both escaped and unescaped tags | ||
| const decode = (s: string) => s.replace(/</g, "<").replace(/>/g, ">").replace(/&/g, "&") | ||
| const decode = (s: string) => | ||
| s.replace(/</g, "<") | ||
| .replace(/>/g, ">") | ||
| .replace(/&/g, "&") | ||
|
|
||
| const raw = decode(xmlLike) | ||
| const fileRegex = /<file>([\s\S]*?)<\/file>/g |
| */ | ||
| function summarizeFilesXml(xmlLike: string): string { | ||
| // Support both escaped and unescaped tags | ||
| const decode = (s: string) => s.replace(/</g, "<").replace(/>/g, ">").replace(/&/g, "&") |
Check warning
Code scanning / CodeQL
Replacement of a substring with itself Medium
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 3 months ago
The best fix is to replace the decode function with one that properly decodes XML/HTML entities, converting sequences like < to <, > to >, and & to &, as well as any other encountered entities. Since TypeScript/JavaScript does not provide a standard, reliable method for entity decoding, but the well-known library he (html-entities is also popular) can accomplish this robustly. In the context of this file, we only need to decode a string, so importing and using a decoder from a widely used library is the best way to meet the requirement. Therefore, replace the ad-hoc decode function with an implementation using he.decode. This change is limited to src/shared/formatContentBlockForUi.ts. If importing is not possible, you may use a minimum viable manual decoding for the three most common entities (<, >, &).
-
Copy modified lines R126-R127
| @@ -123,7 +123,8 @@ | ||
| */ | ||
| function summarizeFilesXml(xmlLike: string): string { | ||
| // Support both escaped and unescaped tags | ||
| const decode = (s: string) => s.replace(/</g, "<").replace(/>/g, ">").replace(/&/g, "&") | ||
| import { decode as heDecode } from "he"; | ||
| const decode = (s: string) => heDecode(s); | ||
|
|
||
| const raw = decode(xmlLike) | ||
| const fileRegex = /<file>([\s\S]*?)<\/file>/g |
-
Copy modified lines R524-R525
| @@ -521,7 +521,8 @@ | ||
| "web-tree-sitter": "^0.25.6", | ||
| "workerpool": "^9.2.0", | ||
| "yaml": "^2.8.0", | ||
| "zod": "^3.25.61" | ||
| "zod": "^3.25.61", | ||
| "he": "^1.2.0" | ||
| }, | ||
| "devDependencies": { | ||
| "@roo-code/build": "workspace:^", |
| Package | Version | Security advisories |
| he (npm) | 1.2.0 | None |
| } | ||
|
|
||
| function extractPathsFromXml(xml: string): string[] { | ||
| const decode = (s: string) => s.replace(/</g, "<").replace(/>/g, ">").replace(/&/g, "&") |
Check warning
Code scanning / CodeQL
Replacement of a substring with itself Medium
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 3 months ago
The main issue is that the decode function is defined as follows:
const decode = (s: string) => s.replace(/</g, "<").replace(/>/g, ">").replace(/&/g, "&")This does nothing, because it replaces "<" with "<", and so on. The function should decode XML/HTML entity escapes such as <, >, and & to their respective literal characters, so that subsequent regex matches against tags like <file> work whether the tags are escaped or not.
General repair:
Replace the decode function definition with one that actually decodes <, >, and & to <, >, and &, respectively.
Best way (detailed):
- Update the
decodefunction insummarizeFilesXml(line 126) and inextractPathsFromXml(line 161) so they replace escaped XML/HTML entities (<,>,&) with the corresponding literal characters (<,>,&). - Do not change other code or functionality, only fix the decoding.
- No new import is needed; simple string replace suffices for these cases.
-
Copy modified lines R126-R127 -
Copy modified lines R162-R163
| @@ -123,7 +123,8 @@ | ||
| */ | ||
| function summarizeFilesXml(xmlLike: string): string { | ||
| // Support both escaped and unescaped tags | ||
| const decode = (s: string) => s.replace(/</g, "<").replace(/>/g, ">").replace(/&/g, "&") | ||
| const decode = (s: string) => | ||
| s.replace(/</g, "<").replace(/>/g, ">").replace(/&/g, "&") | ||
|
|
||
| const raw = decode(xmlLike) | ||
| const fileRegex = /<file>([\s\S]*?)<\/file>/g | ||
| @@ -158,7 +159,8 @@ | ||
| } | ||
|
|
||
| function extractPathsFromXml(xml: string): string[] { | ||
| const decode = (s: string) => s.replace(/</g, "<").replace(/>/g, ">").replace(/&/g, "&") | ||
| const decode = (s: string) => | ||
| s.replace(/</g, "<").replace(/>/g, ">").replace(/&/g, "&") | ||
| const raw = decode(xml) | ||
| const pathRegex = /<path>([\s\S]*?)<\/path>/g | ||
| const paths: string[] = [] |
| } | ||
|
|
||
| function extractPathsFromXml(xml: string): string[] { | ||
| const decode = (s: string) => s.replace(/</g, "<").replace(/>/g, ">").replace(/&/g, "&") |
Check warning
Code scanning / CodeQL
Replacement of a substring with itself Medium
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 3 months ago
The bug exists in the decode functions (lines 126 and 161) where .replace(/</g, "<").replace(/>/g, ">") replaces characters with themselves — a useless operation. The intention appears to be decoding the XML-escaped forms (<, >, and &), converting XML character entities back to their original characters. The correct replacements should therefore be:
.replace(/</g, "<").replace(/>/g, ">").replace(/&/g, "&")
To fix this:
- On lines 126 and 161, update both decode functions to replace
<,>, and&with<,>, and&respectively, instead of attempting to replace literal<,>, and&with themselves. - No imports are required.
- Only change the decode lambda body in both places, not other code.
-
Copy modified lines R126-R129 -
Copy modified lines R164-R167
| @@ -123,7 +123,10 @@ | ||
| */ | ||
| function summarizeFilesXml(xmlLike: string): string { | ||
| // Support both escaped and unescaped tags | ||
| const decode = (s: string) => s.replace(/</g, "<").replace(/>/g, ">").replace(/&/g, "&") | ||
| const decode = (s: string) => s | ||
| .replace(/</g, "<") | ||
| .replace(/>/g, ">") | ||
| .replace(/&/g, "&") | ||
|
|
||
| const raw = decode(xmlLike) | ||
| const fileRegex = /<file>([\s\S]*?)<\/file>/g | ||
| @@ -158,7 +161,10 @@ | ||
| } | ||
|
|
||
| function extractPathsFromXml(xml: string): string[] { | ||
| const decode = (s: string) => s.replace(/</g, "<").replace(/>/g, ">").replace(/&/g, "&") | ||
| const decode = (s: string) => s | ||
| .replace(/</g, "<") | ||
| .replace(/>/g, ">") | ||
| .replace(/&/g, "&") | ||
| const raw = decode(xml) | ||
| const pathRegex = /<path>([\s\S]*?)<\/path>/g | ||
| const paths: string[] = [] |
| } | ||
|
|
||
| function extractPathsFromXml(xml: string): string[] { | ||
| const decode = (s: string) => s.replace(/</g, "<").replace(/>/g, ">").replace(/&/g, "&") |
Check warning
Code scanning / CodeQL
Replacement of a substring with itself Medium
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 3 months ago
To fix the problem, the decode function should replace entity-encoded representations (such as &, <, >) with their corresponding normal characters (&, <, >). Therefore, within the decode function, update the replacement logic to convert both < and > entities, as well as & to &. Specifically, change the replacements so that .replace(/&/g, "&") becomes .replace(/&/g, "&"), and (optionally) update < and > handling for XML entities if necessary. Make changes only in the code region defining the decode function within summarizeFilesXml and extractPathsFromXml (lines 126 and 161).
-
Copy modified line R126 -
Copy modified line R161
| @@ -123,7 +123,7 @@ | ||
| */ | ||
| function summarizeFilesXml(xmlLike: string): string { | ||
| // Support both escaped and unescaped tags | ||
| const decode = (s: string) => s.replace(/</g, "<").replace(/>/g, ">").replace(/&/g, "&") | ||
| const decode = (s: string) => s.replace(/&/g, "&").replace(/</g, "<").replace(/>/g, ">") | ||
|
|
||
| const raw = decode(xmlLike) | ||
| const fileRegex = /<file>([\s\S]*?)<\/file>/g | ||
| @@ -158,7 +158,7 @@ | ||
| } | ||
|
|
||
| function extractPathsFromXml(xml: string): string[] { | ||
| const decode = (s: string) => s.replace(/</g, "<").replace(/>/g, ">").replace(/&/g, "&") | ||
| const decode = (s: string) => s.replace(/&/g, "&").replace(/</g, "<").replace(/>/g, ">") | ||
| const raw = decode(xml) | ||
| const pathRegex = /<path>([\s\S]*?)<\/path>/g | ||
| const paths: string[] = [] |
| if (!text) return "" | ||
|
|
||
| // If this looks like a files XML, summarize paths/errors/notices and drop content bodies | ||
| if (text.includes("<files") || text.includes("<files")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This condition checks for the unescaped <files tag twice. The second check should be for the escaped version <files to properly detect escaped XML in text content. Without this, escaped XML tags won't be summarized and could bloat the UI messages.
| s.includes("<<<<<<< SEARCH") || | ||
| s.includes(">>>>>>> REPLACE") || | ||
| s.includes("<<<<<<< SEARCH") || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These lines check for the same unescaped diff markers twice. To properly handle both escaped and unescaped content, the second set should check for escaped versions (e.g., <<<<<<< SEARCH and >>>>>>> REPLACE) to detect diff content that may have been HTML-escaped in the text.
| */ | ||
| function summarizeFilesXml(xmlLike: string): string { | ||
| // Support both escaped and unescaped tags | ||
| const decode = (s: string) => s.replace(/</g, "<").replace(/>/g, ">").replace(/&/g, "&") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The entity decoding order is incorrect. When decoding HTML entities, & must be decoded LAST, otherwise double-encoded entities won't decode properly. For example, with the current order, &lt; becomes < (still escaped) instead of < (correct). The correct order should be: .replace(/</g, '<').replace(/>/g, '>').replace(/&/g, '&'). This same issue exists in the decode function at line 161.
Implements a cleaner approach for UI messages to prevent bloated ui_messages.json:
Functional scope:
Follow-ups (optional):
Important
Introduces
formatContentBlockForUi()to safely format content blocks for UI messages, applied inTask.tsfor cleanerapi_req_startedmessages.formatContentBlockForUi()informatContentBlockForUi.tsto summarize content blocks for UI messages.formatContentBlockToMarkdown()withformatContentBlockForUi()inTask.tsforapi_req_startedmessages.apiConversationHistoryand exports.This description was created by
for 598d7a3. You can customize this summary. It will automatically update as commits are pushed.