Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,11 @@ struct __libcpp_locale_guard {
// each category. In the second case, we know at least one category won't
// be what we want, so we only have to check the first case.
if (std::strcmp(__l.__get_locale(), __lc) != 0) {
__locale_all = _strdup(__lc);
// Use wsetlocale to query the current locale string. This avoids a lossy
// conversion of the locale string from UTF-16 to the current LC_CTYPE
// charset. The Windows CRT allows language / country strings outside of
// ASCII, e.g. "Norwegian Bokm\u00E5l_Norway.utf8".
__locale_all = _wcsdup(__wsetlocale(nullptr));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit of a pity that this requires calling __wsetlocale() a second time after the first __setlocale() above; I'm wondering if this is a risk for performance degradation.

See e.g. #56202 for a case where this has been measured to be a bottleneck - CC @alvinhochun.

Here, I guess the alternative is to unconditionally use __wsetlocale() above for fetching the name of the current locale, and that requires us to do more of the potentially messy charset conversions. Likewise - the form that this patch suggests feels a bit asymmetrical, when both narrow and wide APIs are being used for the same thing. But I see that it would require more of a mess and more local charset conversions (and require us to decide which charset to use for conversions) if we'd switch over entirely.

So all in all, this is probably fine in this form; I think I agree that this is a reasonable compromise form.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to follow up on my performance concern here... (We can't block a correctness fix on performance grounds, but it's at least good to be aware of the impact); I did repeat the benchmarks from #56202 with this patch, and I can't discern any significant difference in the benchmarks.

So just as we have the existing __setlocale(nullptr) as a cheap guard against doing needless locale changes, the extra __wsetlocale(nullptr) also is cheap compared with the actual locale changes, which are the heavy operations we'd like to avoid repeating.

if (__locale_all == nullptr)
__throw_bad_alloc();
__setlocale(__l.__get_locale());
Expand All @@ -57,7 +61,7 @@ struct __libcpp_locale_guard {
// for the different categories in the same format as returned by
// setlocale(LC_ALL, nullptr).
if (__locale_all != nullptr) {
__setlocale(__locale_all);
__wsetlocale(__locale_all);
free(__locale_all);
}
_configthreadlocale(__status);
Expand All @@ -68,8 +72,14 @@ struct __libcpp_locale_guard {
__throw_bad_alloc();
return __new_locale;
}
static const wchar_t* __wsetlocale(const wchar_t* __locale) {
const wchar_t* __new_locale = _wsetlocale(LC_ALL, __locale);
if (__new_locale == nullptr)
__throw_bad_alloc();
return __new_locale;
}
int __status;
char* __locale_all = nullptr;
wchar_t* __locale_all = nullptr;
};
#endif

Expand Down
16 changes: 13 additions & 3 deletions libcxx/include/__locale_dir/support/windows.h
Original file line number Diff line number Diff line change
Expand Up @@ -162,6 +162,12 @@ inline _LIBCPP_HIDE_FROM_ABI char* __setlocale(int __category, const char* __loc
std::__throw_bad_alloc();
return __new_locale;
}
inline _LIBCPP_HIDE_FROM_ABI wchar_t* __wsetlocale(int __category, const wchar_t* __locale) {
wchar_t* __new_locale = ::_wsetlocale(__category, __locale);
if (__new_locale == nullptr)
std::__throw_bad_alloc();
return __new_locale;
}
_LIBCPP_EXPORTED_FROM_ABI __lconv_t* __localeconv(__locale_t& __loc);
#endif // _LIBCPP_BUILDING_LIBRARY

Expand Down Expand Up @@ -309,7 +315,11 @@ struct __locale_guard {
// each category. In the second case, we know at least one category won't
// be what we want, so we only have to check the first case.
if (std::strcmp(__l.__get_locale(), __lc) != 0) {
__locale_all = _strdup(__lc);
// Use wsetlocale to query the current locale string. This avoids a lossy
// conversion of the locale string from UTF-16 to the current LC_CTYPE
// charset. The Windows CRT allows language / country strings outside of
// ASCII, e.g. "Norwegian Bokm\u00E5l_Norway.utf8".
__locale_all = _wcsdup(__locale::__wsetlocale(LC_ALL, nullptr));
if (__locale_all == nullptr)
std::__throw_bad_alloc();
__locale::__setlocale(LC_ALL, __l.__get_locale());
Expand All @@ -321,13 +331,13 @@ struct __locale_guard {
// for the different categories in the same format as returned by
// setlocale(LC_ALL, nullptr).
if (__locale_all != nullptr) {
__locale::__setlocale(LC_ALL, __locale_all);
__locale::__wsetlocale(LC_ALL, __locale_all);
free(__locale_all);
}
_configthreadlocale(__status);
}
int __status;
char* __locale_all = nullptr;
wchar_t* __locale_all = nullptr;
};
#endif // _LIBCPP_BUILDING_LIBRARY

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
//===----------------------------------------------------------------------===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//

// <locale>

// REQUIRES: windows

// The C RunTime library on Windows supports locale strings with
// characters outside the ASCII range. This poses challenges for
// code that temporarily set a custom thread locale.
//
// https://github.com/llvm/llvm-project/issues/160478

#include <locale>
#include <iostream>
#include <iomanip>
#include <algorithm>

#include <cstdlib>
#include <cassert>
#include <clocale>

#include "test_macros.h"

void locale_name_replace_codepage(std::string& locale_name, const std::string& codepage) {
auto dot_position = locale_name.rfind('.');
LIBCPP_ASSERT(dot_position != std::string::npos);

locale_name = locale_name.substr(0, dot_position) + codepage;
}

int main(int, char**) {
_configthreadlocale(_ENABLE_PER_THREAD_LOCALE);

std::string locale_name = std::setlocale(LC_ALL, "norwegian-bokmal");

const auto& not_ascii = [](char c) { return (c & 0x80) != 0; };
LIBCPP_ASSERT(std::any_of(locale_name.begin(), locale_name.end(), not_ascii));

locale_name_replace_codepage(locale_name, ".437");
LIBCPP_ASSERT(std::setlocale(LC_ALL, locale_name.c_str()));

std::cerr.imbue(std::locale::classic());
std::cerr << std::setprecision(2) << 0.1 << std::endl;

return EXIT_SUCCESS;
}
Loading