Skip to content

fix(install): strip non-ASCII bytes from install.ps1 (unblocks self-update on cp1252 consoles)#1523

Merged
danielmeppiel merged 1 commit into
mainfrom
danielmeppiel/fix-windows-shim-ascii-encoding
May 28, 2026
Merged

fix(install): strip non-ASCII bytes from install.ps1 (unblocks self-update on cp1252 consoles)#1523
danielmeppiel merged 1 commit into
mainfrom
danielmeppiel/fix-windows-shim-ascii-encoding

Conversation

@danielmeppiel
Copy link
Copy Markdown
Collaborator

TL;DR

Follow-up to #1522. The Windows CI apm self-update test (run 26543072369) fails with:

[x] Update failed: 'charmap' codec can't encode characters in position 33941-33942: character maps to <undefined>

install.ps1 contained two non-ASCII byte sequences -- an em-dash (pre-existing) and replacement chars I introduced in a comment. The v0.13.0 self-updater downloads install.ps1 via Python urllib and prints it through a cp1252 console; non-ASCII bytes raise UnicodeEncodeError.

Problem (WHY)

Per the repo's encoding contract (.github/instructions/encoding.instructions.md):

All source code files and CLI output strings must stay within printable ASCII (U+0020-U+007E). Windows cp1252 terminals raise UnicodeEncodeError: 'charmap' codec can't encode character for any character outside cp1252.

Two violations in install.ps1:

  1. Em-dash (U+2014) at byte 29170 -- pre-existing comment: # need a retry — acceptable for an install/self-update operation.
  2. Replacement characters (U+FFFD U+FFFD) at byte 33139 -- introduced by fix(install): write apm.cmd shim as ASCII (cmd.exe cannot parse UTF-16LE) #1522 in the comment I added describing what a UTF-16 cmd.exe parse failure looks like. I literally pasted the garbled-output illustration verbatim instead of describing it in ASCII.

Why this only surfaced now: #1522 fixed the apm.cmd shim, which previously crashed before reaching the self-update path. With the shim working, Test 5 can now exercise self-update end-to-end -- and immediately hits the cp1252 wall on the downloaded install.ps1.

Approach (WHAT)

Replace both non-ASCII sequences with their ASCII equivalents:

  • (em-dash) -> -- (ASCII digraph)
  • ">��@" (replacement chars) -> plain ASCII prose describing the failure mode

install.ps1 now contains zero bytes above 0x7E (verified by hand-scan).

Validation evidence

  • python3 -c "data = open('install.ps1','rb').read(); ..." reports no bytes > 127.
  • 8/8 regression tests in tests/unit/install/test_windows_shim_template.py still pass.
  • Full lint chain silent: ruff check, ruff format, pylint R0801.
  • The actual Windows CI apm self-update test will run on this PR -- the only ground-truth verification.

Trade-offs

None. ASCII-equivalent prose is strictly required by the repo's encoding rule; this PR brings install.ps1 into compliance.

Refs #1522.

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

Follow-up to this PR's first commit. The Windows CI 'apm self-update'
test (Test 5) downloads install.ps1 via Python urllib in the v0.13.0
installed CLI and prints it through a cp1252 console; the script
must therefore be pure ASCII or the print raises
'charmap codec can't encode characters in position 33941-33942'.

Two violations of the repo's ASCII-only rule
(.github/instructions/encoding.instructions.md):

1. Em-dash (U+2014) in a pre-existing comment at byte 29170
   ('need a retry -- acceptable for an install/self-update').
   Replaced with the ASCII '--' digraph.

2. Replacement characters (U+FFFD U+FFFD) at byte 33139 inside the
   comment this PR just added describing what a UTF-16 cmd.exe parse
   failure looks like. I had literally pasted the garbled-output
   illustration verbatim. Reworded to describe the failure mode in
   plain ASCII prose.

install.ps1 now contains zero bytes above 0x7E. The previously-failing
self-update path can write the script to a cp1252 console without
raising UnicodeEncodeError.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 27, 2026 22:58
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Removes two non-ASCII byte sequences from install.ps1 (an em-dash and replacement characters in comments) so that the v0.13.0 self-updater, which prints the downloaded script via a cp1252 Windows console, no longer raises UnicodeEncodeError. Brings the file into compliance with the repo's printable-ASCII encoding contract.

Changes:

  • Replace em-dash with -- in the rollback-window comment at line 732.
  • Replace ">��@" replacement chars with plain ASCII prose describing the UTF-16 cmd.exe failure mode at lines 805-806.
Show a summary per file
File Description
install.ps1 Strips two non-ASCII byte sequences from comments to satisfy the printable-ASCII encoding rule.

Copilot's findings

  • Files reviewed: 1/1 changed files
  • Comments generated: 0

@danielmeppiel danielmeppiel merged commit e5a272a into main May 28, 2026
21 checks passed
@danielmeppiel danielmeppiel deleted the danielmeppiel/fix-windows-shim-ascii-encoding branch May 28, 2026 06:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants