Skip to content

Conversation

@roomote
Copy link
Contributor

@roomote roomote bot commented Oct 6, 2025

This PR attempts to address Issue #8530. Feedback and guidance are welcome.

Problem

Non-ASCII characters (Cyrillic, Chinese, Hindi, etc.) are displayed as "?" or diamond symbols in the Roo Code terminal output on Windows.

Solution

  • Added UTF-8 environment variables (LANG and LC_ALL set to en_US.UTF-8)
  • Prepend chcp 65001 command on Windows to set terminal code page to UTF-8
  • Handle both PowerShell and CMD syntax for code page switching

Changes

  • Modified Terminal.getEnv() to include UTF-8 locale environment variables
  • Updated TerminalProcess.run() to prepend the chcp 65001 command on Windows
  • Updated tests to verify the UTF-8 environment variables are set correctly

Testing

All existing tests pass. The implementation properly handles both PowerShell and CMD terminals.

Fixes #8530


Important

Adds UTF-8 encoding support for Windows terminal output by setting environment variables and modifying command execution.

This description was created by Ellipsis for 1781e3f. You can customize this summary. It will automatically update as commits are pushed.

- Add LANG and LC_ALL environment variables set to en_US.UTF-8
- Add Windows-specific CHCP=65001 for UTF-8 code page support
- Update tests to verify UTF-8 encoding environment variables
- This fixes non-ASCII characters (Cyrillic, Chinese, Hindi, etc.) being displayed as '?' or diamond symbols

Fixes #8530
- Add LANG and LC_ALL environment variables for UTF-8 encoding
- Prepend 'chcp 65001' command on Windows to set code page to UTF-8
- Handle both PowerShell and CMD syntax for code page switching
- Remove incorrect CHCP environment variable approach
- Update tests to reflect the corrected implementation

This properly fixes non-ASCII characters (Cyrillic, Chinese, Hindi, etc.)
being displayed as '?' or diamond symbols in terminal output on Windows.

Fixes #8530
@roomote roomote bot requested review from cte, jr and mrubens as code owners October 6, 2025 13:10
@dosubot dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Oct 6, 2025
@dosubot dosubot bot added the bug Something isn't working label Oct 6, 2025
@hannesrudolph hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Oct 6, 2025
Copy link
Contributor Author

@roomote roomote bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Self-review activated; arguing with myself about Unicode like a robot debating mirror neurons.

// Ensure UTF-8 encoding for proper Unicode character display
// This fixes issues with non-ASCII characters (Cyrillic, Chinese, Hindi, etc.)
// being displayed as "?" or diamond symbols in terminal output
LANG: "en_US.UTF-8",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P1] Forcing LANG/LC_ALL to en_US.UTF-8 overrides user locale and can emit locale warnings on systems without that locale. Consider not overriding when already set and prefer a safer default like C.UTF-8; also avoid setting these on Windows where they don't influence the console code page.


// On Windows, prepend chcp 65001 to set UTF-8 code page for proper Unicode support
// This fixes issues with non-ASCII characters being displayed as "?" or diamond symbols
if (process.platform === "win32") {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] PowerShell detection relies on terminal.integrated.defaultProfile and may not match the active terminal shell at runtime. If mismatched, the PowerShell-style redirection ($null) will fail in CMD. Derive from terminal.state.shell.id/executable (when available) and only fall back to the setting.

if (process.platform === "win32") {
if (isPowerShell) {
// PowerShell syntax: use semicolon to chain commands and redirect output to null
commandToExecute = `chcp 65001 > $null ; ${command}`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] chcp 65001 sets the console code page, but PowerShell may still output non‑UTF‑8 without $OutputEncoding being set. Setting both $OutputEncoding and [Console]::OutputEncoding improves reliability for Unicode output.

let commandToExecute = command
let commandToExecute = command

// On Windows, prepend chcp 65001 to set UTF-8 code page for proper Unicode support
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P3] chcp 65001 is executed before every command; the code page persists for the session, so this is redundant overhead. Consider setting it once per terminal session and caching a flag to skip subsequent calls.

@github-project-automation github-project-automation bot moved this from Triage to Done in Roo Code Roadmap Oct 28, 2025
@github-project-automation github-project-automation bot moved this from New to Done in Roo Code Roadmap Oct 28, 2025
@hannesrudolph
Copy link
Collaborator

Did no twork

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. size:M This PR changes 30-99 lines, ignoring generated files.

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[BUG] Non-ASCII characters (Cyrillic, Chinese, Hindi, etc.) displayed as "?" or diamond symbols in Roo Code output

3 participants