-
Notifications
You must be signed in to change notification settings - Fork 2.6k
fix: add UTF-8 encoding support for Windows terminal output #8531
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Add LANG and LC_ALL environment variables set to en_US.UTF-8 - Add Windows-specific CHCP=65001 for UTF-8 code page support - Update tests to verify UTF-8 encoding environment variables - This fixes non-ASCII characters (Cyrillic, Chinese, Hindi, etc.) being displayed as '?' or diamond symbols Fixes #8530
- Add LANG and LC_ALL environment variables for UTF-8 encoding - Prepend 'chcp 65001' command on Windows to set code page to UTF-8 - Handle both PowerShell and CMD syntax for code page switching - Remove incorrect CHCP environment variable approach - Update tests to reflect the corrected implementation This properly fixes non-ASCII characters (Cyrillic, Chinese, Hindi, etc.) being displayed as '?' or diamond symbols in terminal output on Windows. Fixes #8530
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Self-review activated; arguing with myself about Unicode like a robot debating mirror neurons.
| // Ensure UTF-8 encoding for proper Unicode character display | ||
| // This fixes issues with non-ASCII characters (Cyrillic, Chinese, Hindi, etc.) | ||
| // being displayed as "?" or diamond symbols in terminal output | ||
| LANG: "en_US.UTF-8", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[P1] Forcing LANG/LC_ALL to en_US.UTF-8 overrides user locale and can emit locale warnings on systems without that locale. Consider not overriding when already set and prefer a safer default like C.UTF-8; also avoid setting these on Windows where they don't influence the console code page.
|
|
||
| // On Windows, prepend chcp 65001 to set UTF-8 code page for proper Unicode support | ||
| // This fixes issues with non-ASCII characters being displayed as "?" or diamond symbols | ||
| if (process.platform === "win32") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[P2] PowerShell detection relies on terminal.integrated.defaultProfile and may not match the active terminal shell at runtime. If mismatched, the PowerShell-style redirection ($null) will fail in CMD. Derive from terminal.state.shell.id/executable (when available) and only fall back to the setting.
| if (process.platform === "win32") { | ||
| if (isPowerShell) { | ||
| // PowerShell syntax: use semicolon to chain commands and redirect output to null | ||
| commandToExecute = `chcp 65001 > $null ; ${command}` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[P2] chcp 65001 sets the console code page, but PowerShell may still output non‑UTF‑8 without $OutputEncoding being set. Setting both $OutputEncoding and [Console]::OutputEncoding improves reliability for Unicode output.
| let commandToExecute = command | ||
| let commandToExecute = command | ||
|
|
||
| // On Windows, prepend chcp 65001 to set UTF-8 code page for proper Unicode support |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[P3] chcp 65001 is executed before every command; the code page persists for the session, so this is redundant overhead. Consider setting it once per terminal session and caching a flag to skip subsequent calls.
|
Did no twork |
This PR attempts to address Issue #8530. Feedback and guidance are welcome.
Problem
Non-ASCII characters (Cyrillic, Chinese, Hindi, etc.) are displayed as "?" or diamond symbols in the Roo Code terminal output on Windows.
Solution
Changes
Testing
All existing tests pass. The implementation properly handles both PowerShell and CMD terminals.
Fixes #8530
Important
Adds UTF-8 encoding support for Windows terminal output by setting environment variables and modifying command execution.
LANGandLC_ALLtoen_US.UTF-8inTerminal.getEnv()to ensure UTF-8 encoding.chcp 65001command inTerminalProcess.run()for Windows to set terminal code page to UTF-8.TerminalRegistry.spec.tsto verify UTF-8 environment variables are set correctly.This description was created by
for 1781e3f. You can customize this summary. It will automatically update as commits are pushed.