Skip to content

Conversation

m417z
Copy link
Contributor

@m417z m417z commented Sep 23, 2025

--------------------------------------------------
Hex    Count    Percentage   Character
--------------------------------------------------
a0     1648       98.6%      ' '
96     18          1.1%      '–'
ae     4           0.2%      '®'
97     2           0.1%      '—'
--------------------------------------------------

Note that 85 files also contain the Replacement Character (�). These files weren't fixed, they have to be fixed manually since the character can't be recovered automatically.
Edit: See #2095.

--------------------------------------------------
Hex    Count    Percentage   Character
--------------------------------------------------
a0     1648       98.6%      ' '
96     18          1.1%      '–'
ae     4           0.2%      '®'
97     2           0.1%      '—'
--------------------------------------------------
@GrantMeStrength
Copy link
Collaborator

All these changes need to be confirmed manually, and there are so many the tool goes too slowly. It's unlikely I will have time to approve these.

@m417z
Copy link
Contributor Author

m417z commented Sep 24, 2025

Note that this was an automated replacement. If you prefer to verify it, you can create a script which simply replaces the specified bytes and verify the result.

@GrantMeStrength
Copy link
Collaborator

Every correct to the docs needs to be manually verified by policy. We can't just accept bulk updates.

@m417z
Copy link
Contributor Author

m417z commented Sep 25, 2025

I understand. Let me know if there's anything else I can do to help with this, like providing a Python script which does the replacements.

@GrantMeStrength
Copy link
Collaborator

Is there a way to focus on the most egregious errors, and then break them up into smaller PRs?
I appreciate your time and effort BTW!

@m417z
Copy link
Contributor Author

m417z commented Sep 26, 2025

I don't think there are more or less egregious errors here. There are just many files which aren't valid UTF-8, which is the format everyone expects nowadays. I don't think going over manually is the best way to handle it, but if you have to and if it will help, I can split it to chunks and open several PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants