Skip to content

Conversation

tex3d
Copy link
Contributor

@tex3d tex3d commented Jun 3, 2025

There were multiple issues with Unicode conversion on *nix platforms. This PR fixes issues I found with the conversion functions that were causing failures when running locally, due to issues with setting the locale. It also had incorrect behavior for emulating the MultiByteToWideChar API.

This change makes the local setting thread safe and more robust to different available locales in runtime environments.

I fixed some off-by-one issues related to null termination, and eliminated some extra copies caused by detecting a string length, then passing the size without the null-terminator to a function which then had to copy the input string again to guarantee null-termination.

The CompilerTest::CompileWithEncodeFlagTestSource test has minor updates for clarity and an added scenario.

The changed code passes the Unicode tests now without asserting across all platforms.
This change should have no functional impacts, except eliminating potential double-null-termination in some cases, and catching more error conditions.

Copy link
Collaborator

@llvm-beanz llvm-beanz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a more robust solution to the problem #7458 is trying to address.

@tex3d can you confirm?

@tex3d
Copy link
Contributor Author

tex3d commented Jul 11, 2025

I think this is a more robust solution to the problem #7458 is trying to address.

@tex3d can you confirm?

Yeah, this one fixes a number of other (potential) issues too.

Comment on lines +32 to +33
if (cbMultiByte == 0 || cbMultiByte < -1 || cbMultiByte > (INT32_MAX - 1) ||
cchWideChar < 0 || cchWideChar > (INT32_MAX - 1)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't cbMultiByte > (INT32_MAX - 1) equivalent to cbMultiByte == INT32_MAX? Unless we were to change cbMultiByte and cchWideChar to larger types I think the equality check is clearer, no?

@igaryhe
Copy link

igaryhe commented Oct 11, 2025

Hello, I've tested this PR and it indeed fixes several test failure for me. However, I cannot get any output by running dxc -help or dxc --version. What could be wrong?

bool WideToEncodedString(const wchar_t *text, size_t cWide, DWORD cp,
DWORD flags, std::string *pValue, bool *lossy) {
DXASSERT_NOMSG(cWide == ~(size_t)0 || cWide < INT32_MAX);
if (text == nullptr || pValue == nullptr || cWide == 0 || cWide >= INT32_MAX)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since cWide is set to ~(size_t)0 elsewhere, and this is guaranteed to be larger than INT32_MAX, so the last case seems a bit problematic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: New

Development

Successfully merging this pull request may close these issues.

4 participants