Fix invalid UTF-8 encoding #2094

m417z · 2025-09-23T20:28:45Z

--------------------------------------------------
Hex    Count    Percentage   Character
--------------------------------------------------
a0     1648       98.6%      ' '
96     18          1.1%      '–'
ae     4           0.2%      '®'
97     2           0.1%      '—'
--------------------------------------------------

Note that 85 files also contain the Replacement Character (�). These files weren't fixed, they have to be fixed manually since the character can't be recovered automatically.
Edit: See #2095.

-------------------------------------------------- Hex Count Percentage Character -------------------------------------------------- a0 1648 98.6% ' ' 96 18 1.1% '–' ae 4 0.2% '®' 97 2 0.1% '—' --------------------------------------------------

GrantMeStrength · 2025-09-24T23:23:55Z

All these changes need to be confirmed manually, and there are so many the tool goes too slowly. It's unlikely I will have time to approve these.

m417z · 2025-09-24T23:26:15Z

Note that this was an automated replacement. If you prefer to verify it, you can create a script which simply replaces the specified bytes and verify the result.

GrantMeStrength · 2025-09-25T18:00:50Z

Every correct to the docs needs to be manually verified by policy. We can't just accept bulk updates.

m417z · 2025-09-25T18:02:31Z

I understand. Let me know if there's anything else I can do to help with this, like providing a Python script which does the replacements.

GrantMeStrength · 2025-09-26T15:50:43Z

Is there a way to focus on the most egregious errors, and then break them up into smaller PRs?
I appreciate your time and effort BTW!

m417z · 2025-09-26T23:42:03Z

I don't think there are more or less egregious errors here. There are just many files which aren't valid UTF-8, which is the format everyone expects nowadays. I don't think going over manually is the best way to handle it, but if you have to and if it will help, I can split it to chunks and open several PRs.

Fix invalid UTF-8 encoding

3d3475d

-------------------------------------------------- Hex Count Percentage Character -------------------------------------------------- a0 1648 98.6% ' ' 96 18 1.1% '–' ae 4 0.2% '®' 97 2 0.1% '—' --------------------------------------------------

prmerger-automator bot added the do-not-merge label Sep 23, 2025

Karl-Bridge-Microsoft assigned GrantMeStrength Sep 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix invalid UTF-8 encoding #2094

Fix invalid UTF-8 encoding #2094

Uh oh!

m417z commented Sep 23, 2025 •

edited

Loading

Uh oh!

GrantMeStrength commented Sep 24, 2025

Uh oh!

m417z commented Sep 24, 2025

Uh oh!

GrantMeStrength commented Sep 25, 2025

Uh oh!

m417z commented Sep 25, 2025

Uh oh!

GrantMeStrength commented Sep 26, 2025

Uh oh!

m417z commented Sep 26, 2025

Uh oh!

Uh oh!

Fix invalid UTF-8 encoding #2094

Are you sure you want to change the base?

Fix invalid UTF-8 encoding #2094

Uh oh!

Conversation

m417z commented Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

GrantMeStrength commented Sep 24, 2025

Uh oh!

m417z commented Sep 24, 2025

Uh oh!

GrantMeStrength commented Sep 25, 2025

Uh oh!

m417z commented Sep 25, 2025

Uh oh!

GrantMeStrength commented Sep 26, 2025

Uh oh!

m417z commented Sep 26, 2025

Uh oh!

Uh oh!

m417z commented Sep 23, 2025 •

edited

Loading