Skip to content

Commit 05f5f59

Browse files
authored
perf: faster utf8<->utf16 conversion on Windows (#4549)
OIIO 2.3.13 with PR #3307 changed MultiByteToWideChar/WideCharToMultiByte usage to C++11 <codecvt> functionality, but that has two issues: 1) it is *way* slower, primarily due to locale object access (on Visual C++ STL implementation in VS2022 at least). Since primary use case of these conversions is on Windows, maybe it is better to use a fast code path. 2) whole of <codecvt> machinery is deprecated with C++17 accross the board, and will be removed in C++26. I've kept the existing functions in there since otherwise it would have been an API break, but really maybe with OIIO 3.0 they should have been un-exposed. Too late now though :( ## Tests Performance numbers: doing ImageInput::create() on 1138 files where they are not images at all (so OIIO in turns tries all the input plugins on them). Ryzen 5950X, VS2022, Windows: - utf8_to_utf16 3851ms -> 21ms - utf16_to_utf8 1055ms -> 4ms Signed-off-by: Aras Pranckevicius <[email protected]>
1 parent da475b0 commit 05f5f59

File tree

1 file changed

+36
-7
lines changed

1 file changed

+36
-7
lines changed

src/libutil/strutil.cpp

Lines changed: 36 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,9 @@ OIIO_PRAGMA_WARNING_POP
2929
#if defined(__APPLE__) || defined(__FreeBSD__)
3030
# include <xlocale.h>
3131
#endif
32+
#ifdef _WIN32
33+
# include <windows.h>
34+
#endif
3235

3336
#include <OpenImageIO/dassert.h>
3437
#include <OpenImageIO/string_view.h>
@@ -961,6 +964,17 @@ Strutil::replace(string_view str, string_view pattern, string_view replacement,
961964
std::wstring
962965
Strutil::utf8_to_utf16wstring(string_view str) noexcept
963966
{
967+
#ifdef _WIN32
968+
// UTF8<->UTF16 conversions are primarily needed on Windows, so use the
969+
// fastest option (C++11 <codecvt> is many times slower due to locale
970+
// access overhead, and is deprecated starting with C++17).
971+
std::wstring result;
972+
result.resize(
973+
MultiByteToWideChar(CP_UTF8, 0, str.data(), str.length(), NULL, 0));
974+
MultiByteToWideChar(CP_UTF8, 0, str.data(), str.length(), result.data(),
975+
(int)result.size());
976+
return result;
977+
#else
964978
try {
965979
OIIO_PRAGMA_WARNING_PUSH
966980
OIIO_CLANG_PRAGMA(GCC diagnostic ignored "-Wdeprecated-declarations")
@@ -970,13 +984,25 @@ Strutil::utf8_to_utf16wstring(string_view str) noexcept
970984
} catch (const std::exception&) {
971985
return std::wstring();
972986
}
987+
#endif
973988
}
974989

975990

976991

977992
std::string
978993
Strutil::utf16_to_utf8(const std::wstring& str) noexcept
979994
{
995+
#ifdef _WIN32
996+
// UTF8<->UTF16 conversions are primarily needed on Windows, so use the
997+
// fastest option (C++11 <codecvt> is many times slower due to locale
998+
// access overhead, and is deprecated starting with C++17).
999+
std::string result;
1000+
result.resize(WideCharToMultiByte(CP_UTF8, 0, str.data(), str.length(),
1001+
NULL, 0, NULL, NULL));
1002+
WideCharToMultiByte(CP_UTF8, 0, str.data(), str.length(), &result[0],
1003+
(int)result.size(), NULL, NULL);
1004+
return result;
1005+
#else
9801006
try {
9811007
OIIO_PRAGMA_WARNING_PUSH
9821008
OIIO_CLANG_PRAGMA(GCC diagnostic ignored "-Wdeprecated-declarations")
@@ -986,29 +1012,32 @@ Strutil::utf16_to_utf8(const std::wstring& str) noexcept
9861012
} catch (const std::exception&) {
9871013
return std::string();
9881014
}
1015+
#endif
9891016
}
9901017

9911018

9921019

9931020
std::string
9941021
Strutil::utf16_to_utf8(const std::u16string& str) noexcept
9951022
{
1023+
#ifdef _WIN32
1024+
std::string result;
1025+
result.resize(WideCharToMultiByte(CP_UTF8, 0, (const WCHAR*)str.data(),
1026+
str.length(), NULL, 0, NULL, NULL));
1027+
WideCharToMultiByte(CP_UTF8, 0, (const WCHAR*)str.data(), str.length(),
1028+
&result[0], (int)result.size(), NULL, NULL);
1029+
return result;
1030+
#else
9961031
try {
9971032
OIIO_PRAGMA_WARNING_PUSH
9981033
OIIO_CLANG_PRAGMA(GCC diagnostic ignored "-Wdeprecated-declarations")
999-
// There is a bug in MSVS 2017 causing an unresolved symbol if char16_t is used (see https://stackoverflow.com/a/35103224)
1000-
#if defined _MSC_VER && _MSC_VER >= 1900 && _MSC_VER < 1930
1001-
std::wstring_convert<std::codecvt_utf8_utf16<int16_t>, int16_t> convert;
1002-
auto p = reinterpret_cast<const int16_t*>(str.data());
1003-
return convert.to_bytes(p, p + str.size());
1004-
#else
10051034
std::wstring_convert<std::codecvt_utf8_utf16<char16_t>, char16_t> conv;
10061035
return conv.to_bytes(str);
1007-
#endif
10081036
OIIO_PRAGMA_WARNING_POP
10091037
} catch (const std::exception&) {
10101038
return std::string();
10111039
}
1040+
#endif
10121041
}
10131042

10141043

0 commit comments

Comments
 (0)