Skip to content

Conversation

@Dankirk
Copy link

@Dankirk Dankirk commented Sep 13, 2025

Description

  • Sets C runtime locale to system locale with UTF-8 codepage on Windows.
    This has always been default behavior on unix, but Windows defaults to minimal 'C' locale.

    • LC_NUMERIC is still set to "C". Now on all platforms instead of just unix.
      This is so decimal point is a dot (not a comma) for string <-> float conversions.
  • The configured CRT locale is copied to be the default std::locale for C++.
    All platforms have been using minimal "C" until now. This change affects new facet and ios_base instances without a specified locale or imbue() call.

  • Sets UTF-8 as active codepage for Windows in obs.manifest file. This changes Win32 API to use utf-8 for the A functions instead of the language dependent ANSI codepage. The W functions are untouched and are used by default because we use the UNICODE build flag. It also treats commandline arguments as utf-8, which allows for example --profile <name> to load profiles with special characters.

  • OBS Studio language setting no longer changes QLocale's default locale and instead always uses system locale.
    This gives conformity with non Qt functions, but most importantly is likely what user wants as well. Ie. sorting and formatting functions should follow OS locale rules instead of OBS Studio translations language. (Reverts c4840dd)

  • obs_get_locale() still returns OBS language locale, which is used for Python and LUA apis, GDI+ text widget transformations, and HTTP accepted languages header.

Motivation and Context

Locale-aware operations like sorting and time formatting in C are not available on Windows, but are on unix, as pointed out in PR #12577.
Fixes #11133, fixes #12953

The C++ locale and QLocale changes make the locale-aware functions of all layers work in similiar fashion.

For example: On unix currently the used locales are: OS locale for CRT, minimal "C" for C++ and OBS language for QLocale.
A weekday name can be in three different languages depending if you used strftime(), std::time_get facet or QLocale.
This makes string transformations between C, C++ and Qt very tricky.

How Has This Been Tested?

An important point is that the CRT locale settings introduced here have always been this way for unix, which suggests there aren't any insurmountable problems with the new locales. Windows specific functions should be tested for CRT locale. Changes for C++ and QLocale defaults affect all platforms.

Searched the codebase for affected areas and addressed as necessary:

  • CRT: ctype.h character classification function parameters and expected return values
  • CRT: strftime() formatting with % placeholders
  • CRT: scanf() and printf() formatting with % placeholders
  • CRT: FILE operations
  • C++: fstream operations
  • C++: facet locale usage
  • QLocale: Expected return values of formatting functions
  • QLocale: QString locale-aware methods

Some general testing with Japanese characters

  • Edited recording path with %A (weekday) variable and some Japanese characters. Recorded a video. Weekday name was localized and recording worked fine.
  • Remuxed said file. Worked fine.
  • Renamed some sources with Japanese characters and exported the scene collection, removed it from OBS and re-imported it. No problems.
  • Wrote names of those sources to logFile with blog()

I'm on Windows 11 English US version, but with Finnish locale settings (fi_FI). OBS language is English.

Types of changes

  • New feature (non-breaking change which adds functionality)
  • Tweak (non-breaking change to improve existing functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
    • 3rd party Python, LUA and rtmp that have been using _mbs_ conversion functions directly or via file io operations have to input text in utf-8 and expect utf-8 output (except for wchar/_wcs_ which is OS defined).
    • 3rd party plugins that use the A functions of Win32 API instead of W variants need to expect utf-8 instead of OS default ANSI coded data.

Checklist:

  • My code has been run through clang-format.
  • I have read the contributing document.
  • My code is not on the master branch.
  • The code has been tested.
  • All commit messages are properly formatted and commits squashed where appropriate.
  • I have included updates to all appropriate documentation.

@WizardCM WizardCM added the Enhancement Improvement to existing functionality label Sep 13, 2025
@Dankirk
Copy link
Author

Dankirk commented Sep 15, 2025

Scouted the web and the codebase for potential issues. Here's some observations...
EDIT: These have been accounted for in the PR description.

General stuff about setlocale() on Windows
For list of things C runtime locale affects https://cppreference.com/w/c/locale/setlocale.html

  • Important distinction is some functions only care about codepage/encoding of the locale, not the language_region rules specifically.
  • We can ignore all number related things, because we use the minimal 'C' locale for LC_NUMERIC.
  • Affected string.h and time.h functions are all functions that are specifically for locale-aware things, Things like weekday names will now be localized using strftime() and strcoll() will do a locale-aware comparison.
  • ctype.h character classification ranges are extended. ie. isalnum() may return true for more characters, so there is reason to check if any part using these functions is okay with that. Couldn't hurt to cast the parameters to unsigned char either, since many functions expect value to be 0-255, which char using utf-8 casted to int might not be (char range is -128 to 127). Then again, the functions in use have worked fine on unix until now...
  • stdio.h Formatting of the % placeholders in scanf() and printf() and the sort is affected. Decimals will still be dots (controlled by LC_NUMERIC), but %s will match more. More about file operations below.

multibyte <-> utf8 <-> wchar

In platform.h there are various string conversion functions. From these only the multibyte functions with _mbs_ are affected by this change. The rest use Windows API, which doesn't follow C runtime locale modified by setlocale(). All of these don't care about the language_region, only about the codepage/encoding, which should be UTF-8, not Windows default ANSI 1252 for example.

The _mbs_ functions are currently not used in OBS Studio itself, but are offered for external usage for Python, LUA and rtmp. This means there's no change for OBS Studio itself, but external things might see different results from these conversion functions on Windows, which will now be more alike to return values on unix. On Windows mb* functions are affected by _setmbcp() while setlocale() will suffice on unix.

The utf8 <-> wchar functions (ie os_utf8_to_wcs()) use MultiByteToWideChar() and WideCharToMultiByte() functions with utf-8 codepage, which will work after this update. Unlike mbstowcs() and the sort in _mbs_ implementations, these functions are independent from CRT locale, but do follow the manifest declaration (though only for CP_ACP, which we don't use). On unix these functions use a custom implementation for conversion, which naturally assumes the text is/is-to-be utf-8 encoded.

Streams and file operations

C++ streams, like fstream are controlled by std::locale::global() or facets, which is separate setting from CRT setlocale(). Thus, C++ stream operations have been using the minimal "C" locale by default (both unix and Windows). This change copies the CRT locale as default for C++ too. Any fstreams initialized before std::locale::global() call should call imbue() to match the new locale. Any streams initialized after inherit the global locale.

C-style FILE wide char streams pick the locale available when first io operation is used and continue using that. So it is important setlocale() is called before these streams are used or they are re-opened with freopen().

printf() and scanf() -type functions use locale for % placeholders, as explained above.

When to setlocale() ?

On principle locale should be one of the first things to set, since many things inherit it and it's cumbersome to retroactively reset it's state to existing things. However, since Qt overwrites locale during construction of QApplication (OBSApp) for unix we could use OBSApp constructor, as we have been to reset LC_NUMERIC back to "C". If it is decided that OBS translations locale should be followed instead of OS's, initLocale() also seems acceptable.

@Dankirk Dankirk force-pushed the locale branch 8 times, most recently from 9d20b0c to c0dbe23 Compare September 19, 2025 20:30
@Dankirk Dankirk force-pushed the locale branch 2 times, most recently from 3849250 to 92f93f4 Compare September 28, 2025 18:19
@Dankirk Dankirk changed the title frontend: Use system locale on Windows instead of 'C' frontend: Use system locale instead of 'C' Sep 28, 2025
@Dankirk Dankirk marked this pull request as ready for review September 29, 2025 21:04
@PatTheMav
Copy link
Member

  • OBS Studio language setting no longer changes QLocale's default locale and instead always uses system locale.
    This gives conformity with non Qt functions, but most importantly is likely what user wants as well. Ie. sorting and formatting functions should follow OS locale rules instead of OBS Studio translations language. (Reverts c4840dd)

Highlighting this because this is a severe change, even though I think it's correct in principle. Changing an application's display language should not change the regional settings (which encompass sorting rules as well as decimal point character, etc.), and at least that's how it works on macOS.

@Warchamp7 @Fenrirthviti would be good to hear if you'd be fine with this change conceptually as well.

@Fenrirthviti
Copy link
Member

My main concern here, as someone who only uses the English/USA locale/region, is that I'm unsure what the expectation for a Windows application is. The current motivation seems to be "Unix does it this way" and that to me, is not sufficient. Do we have examples and recommendations from Microsoft, or other prominent Windows applications on how they handle this kind of setting for apps that use translations?

@Dankirk
Copy link
Author

Dankirk commented Jan 15, 2026

Microsoft general guidelines for globalization suggests:

Don't use language to assume a user's region; and don't use region to assume a user's language.

My own take is that OS regional settings + app translations is the desired output with no additional settings in UI, good likelyhood being fine by default, but allows configuration when needed. The obvious drawback is that to change the regional settings one needs to change OS settings, which could be an issue when using a shared device, but OS should provide options for it.

Many Microsoft apps understandably just follow the OS region settings. That includes the file explorer. Apps like Office do offer a separate in app setting for regional settings as well, but that's because they specialize in that sort of thing. For apps in general many have translation + maybe time formatting options and don't follow a specific region per se. The sorting order is a gamble. Steam seems to sort games and friends using the app display language. Spotify sorts playlists by Windows display language (not region settings, nor Spotify display language, I don't recommend this). Whatsapp uses OS regional settings.

In any case, any locale is still better than the current situation with non-changeable "C" locale.

@Fenrirthviti
Copy link
Member

Thanks for the additional context here. My, admittedly mostly uninformed opinion based on the discussion here, is that this seems fine. Without lack of a clear "best practice" on Windows, moving things in-line with our cross-platform implementation seems like the best option.

@PatTheMav or @jcm93 Does macOS follow a similar approach to what is being proposed here?

@PatTheMav
Copy link
Member

Thanks for the additional context here. My, admittedly mostly uninformed opinion based on the discussion here, is that this seems fine. Without lack of a clear "best practice" on Windows, moving things in-line with our cross-platform implementation seems like the best option.

@PatTheMav or @jcm93 Does macOS follow a similar approach to what is being proposed here?

On macOS language and regional settings are also separate things and it's expected that your app's language is actually changed from the OS' language settings (rather than within the application itself) which in gendered languages also includes choice of preferred pronoun:

The language setting indeed only changes the display language, decimal format, sorting, et. al. still follow the regional setting (at least in "native" apps).

@Dankirk
Copy link
Author

Dankirk commented Jan 29, 2026

In the current state the PR works as designed, but I'm thinking of adding the following lines to obs.manifest to set active code page to utf-8 for Win32 APIs (the A versions of functions). While OBS uses those relatively little, with hard coded strings mostly, it could ease 3rd party code adaptation, plugins etc.

It also allows utf-8 encoding of commandline arguments for loading a specific collection, scene or profile by their name.
For example ./obs64.exe --profile キアラ only works with the utf-8 manifest on Windows. Without it arguments are in one of the ANSI codepages instead and we don't have existing ways to deal with that in OBS codebase (they all assume utf-8 or utf-16 wchar).

Lines to add to obs.manifest

<application>
    <windowsSettings>
        <activeCodePage xmlns="http://schemas.microsoft.com/SMI/2019/WindowsSettings">UTF-8</activeCodePage>
    </windowsSettings>
</application>

Doing this would mean the locale setup routine would need to be done much earlier in the program than the OBSApp constructor, preferably at the start of main(). However, since Qt overwrites locales in the base constructor on unix, unix platforms would need to re-do the locale setup routine after, which is a bit of an ugly design.

@Dankirk Dankirk force-pushed the locale branch 2 times, most recently from dd143ca to 29aa8fc Compare January 30, 2026 01:10
@Dankirk
Copy link
Author

Dankirk commented Jan 30, 2026

The obs.manifest changes has been added and locale setup routine moved to beginning of the app, with unix re-running it after OBSApp has been created.

Here's some info about the manifest if you wish to read. https://learn.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page

If there's any issues with locale setup routine being the first thing to do please let me know.
The reasoning for it is that many things inherit the locale when they are initialized and retroactively updating that is cumbersome. This includes things like outputting to anything to stdout or using fstream. On my observation load_debug_privilege() is the first function to use blog(), so the locale setup routine should happen before that.

@Dankirk Dankirk changed the title frontend: Use system locale instead of 'C' frontend: Use system locale instead of 'C' with UTF-8 Jan 30, 2026
Sets runtime locale to system locale with UTF-8 codepage. This is already default behavior on unix, but Windows defaults to minimal 'C' locale.

Use CRT locale for C++ std::locale default

OBS Studio language settings no longer change QLocale default locale, instead system locale is used for conformity. It is likely this is what user wants as well. Ie. sorting and formatting functions should follow OS locale instead of OBS Studio language (which also lacks country information).
Cast ctype function char parameters to unsigned char to ensure they are in correct range (0 to 255 vs -128 to 127) when used with utf-8 encoding (or extended ascii).

Fixes dstr astrcmp* functions when used with utf-8 (or extended ascii) characters, so now they are treated greater than the base ascii and thus sorted after them, not before.
Switch locale-aware timestamping for logging / crash handling to %H:%M:%S
@Dankirk Dankirk changed the title frontend: Use system locale instead of 'C' with UTF-8 frontend: Use system locale with UTF-8 instead of 'C' Jan 31, 2026
Utf-8 manifest allows Win32 API to use utf-8 instead of ANSI codepages. This changes the "A" versions of fucntions to work with utf-8.
Manifest also treats command line arguments as utf8. This allows for example --profile <name> to load profiles with special characters.

The locale setup routine is moved to the beginning of program to better cover io streams, without the need to reconfigure them later. The caveat is that unix will need to re-do the routine after initilizing OBSApp, because Qt overwrites the locales for unix during it's construction.

Added logging for locale info and app language to better diagnose potential errors.
@Dankirk
Copy link
Author

Dankirk commented Feb 1, 2026

Examples of new logging about locale and utf-8 (un)availability when using verbose logging.
The "Locale" line is based on collation (sorting locale), which may be different from for example time formatting, but I think is a good basis.

utf8 available utf8 not available

@PatTheMav
Copy link
Member

The main "problem" this PR has to address is that OBS on the whole is conceptually not built to be aware (much less so capable of handling) of locales (and locale differences), similarly to how it pretends that any set of bytes in memory is "UTF-8" but then happily treats it as "just ASCII".

For better or worse all that OBS can handle without issue is US-American text and formats, anything beyond that is a "happy accident".

Introducing these capabilities has to go far beyond just slapping on some setlocale() calls or adding facets where necessary, because of the following aspects:

  • C only supports global locales that indeed have to be set almost immediately at an app's run time to ensure all system APIs follow suit
  • The global locale is set per process on POSIX, but per thread on Windows, which makes locale-awareness trickier to ensure for all the threads libobs uses in the latter case (and that's not even getting into the weeds of whether certain threads need to be locale aware and if they suddenly become so - because of the global switch on POSIX - we introduce hard to debug/notice side-effects).
    • And many (most?) Windows C APIs do not even respect setlocale?
  • C++ supports setting up custom locale objects and allows to apply them where needed, thus allowing the app to be much more "locale-aware" (e.g. one can use different locales for "reading" and "writing" of data, at least for locale-aware library functions).
  • As if two (or three in the case of Windows) different layers of locale handling are not enough we also use Qt on top of it all, which seems to follow a similar approach like C++ (with QLocale instances)
  • Note that C++'s and Qt's approach is considered "correct" by requiring the app to explicitly use locale awareness where needed instead of flipping a "magic switch" that changes everything behind the scenes. Because it makes sense for an app to use an internal "canonical" format, but use local-awareness when interfacing with the user and the user's data.

That's already a big pile of work to get through and figure out where/how/if to change OBS (and libobs, and the 1st party modules) to become locale-aware, but on top of that we have the (largely non-existent) Unicode-handling:

  • Qt's QStrings are all more or less expected to be encoded in UTF-16 (I haven't checked if it's actually UTF-16 or just UCS-2 with UTF-16 surrogate pairs)
  • std::string instances and char * data is ostensibly thought of as "UTF-8" but indeed is treated as plain ASCII or simply a sequence of bytes in memory.
  • As long as such a "string" is not truncated, manipulated, or collated (and the unchanged array of bytes copied/transferred as-is) the underlying strings will be retained
    • Because those strings are passed as-is to Qt and Qt is indeed UTF-8 aware, the UTF-8 code points will be correctly decoded and re-encoded as UTF-16 when stored in a QString
  • On Windows any wchar_t-based string is considered to be UTF-16 as well (in the case of file system paths that might actually still be UCS-2), which is then passed to Windows APIs to convert to/from UTF-8 (similar to Qt)
  • Almost all internal APIs (or functions) that handle char * data or std::string do not support UTF-8: All bytes are considered "ASCII characters", there is no capability to detect the variable-length encoding of UTF-8 code points, much less so any capability to detect grapheme clusters, or any awareness of language-specific differences in meaning of what "upper case" means.
    • Again, as long as no manipulation or interpretation of the strings take place this is fine, because UTF-8 is then just a "transport" format.
  • As mentioned in the description, on Windows there is the additional complexity of many APIs being available as an "ANSI" variant (which uses the current code page for encoding/decoding) and the "UNICODE" variants (which might either be actual variable-length UTF-16 or fixed-length UCS-2).
    • And as if all of that weren't bad enough, Microsoft added the mentioned support for a "UTF-8 code page" which allows one to use UTF-8 encoded text with the ANSI variants of APIs, which in theory requires developers to un-adopt the use of their UNICODE APIs.

Mind you, I'm not saying that we shouldn't adopt changes like these, but that these changes require a great deal of thought and need to address many architectural design flaws in OBS as it is right now. Simply adding it in bits and pieces runs the risk of fixing symptoms but not fixing the core issue(s), particularly as the entire app has not been designed to properly handle locale's or anything beyond ASCII text and making it handle text "right" in one area might collapse a whole house of cards of assumptions about character data in another.

For that (and a few other reasons) it might take same time (and might even require splitting the whole endeavour up into separate "units") to get it over the finish line.

@Dankirk
Copy link
Author

Dankirk commented Feb 2, 2026

The input is much appreciated. I aknowledge the uncertainty and I'm fine with whatever approach is taken, but I have some counterpoints too.

For better or worse all that OBS can handle without issue is US-American text and formats, anything beyond that is a "happy accident".

I don't believe this. The fact that locales, the region and utf-8 enforcement, have been in use on non-Windows OSs for all of OBS existence while it has been using Qt is a point for that. Some degree of good design choices is needed for that.

The global locale is set per process on POSIX, but per thread on Windows.

This is not true. The threads on Windows have the locale, as long as it is set before spawning threads. References below.
https://learn.microsoft.com/en-us/cpp/parallel/multithreading-and-locales?view=msvc-170

And many (most?) Windows C APIs do not even respect setlocale?

Some only care about the encoding, others about the regional things. Others follow other ways of setting locale, like the manifest. Some really don't care, but considering their limited usability, it might not matter to us either. If it does it's limited job fine it's ok, if not then another approach is in order.

Regarding OBS, Probably biggest culprit we have is the dstr library which comes with the idea that char by char comparison works, but this is not too prevalent in the codebase. dstr' and low level string manipulation functions using ctype.h stdio.h et al libraries is from limited sources and many of them do expect to only have ASCII from them, which does work as intended even with the changes, because of the library limitations. The only fix I have been applying for this is the unsigned char cast, which ensures they don't crash for non-ASCII characters (for being negative when casted from signed char to int by the libraries). Could probably do better, but didn't really see cases where it would have made a difference.

I also expected this to be a hurdle. So I did search the codebase for all prominent standard C and C++ functions that care about the locale and addressed them in this PR if there was a need. That includes tracking down their input origins and where they go to see if UTF-8 and localization is acceptable. I have looked if they care about the region or the encoding, and by set by what (setlocale, active codepage manifest, std::locale::global, or something else). These are not a random list of magic switches in the pr, it's all very much mappable, and have reason to be here.

My experience about utf-8 readiness is that it is almost there. Kind of like you said, many things pretend it already is utf-8 while we have been playing around with ASCII. Other things don't really need to become locale or encoding aware, just as long as their limitation is recognized and used in contexts where you don't expect anything special. I'm weirdly enough expecting this to fix more things by collateral than break, especially those cases where the encoding is coincidentally in ANSI because there was nothing to tell the source we'd like utf-8.

The unicode handling in OBS is otherwise... ok. We have a few ways and can keep using those, nothing wrong with that, they are built correctly for the purpose and work fine after too. This doesn't touch Qt, -W Apis (or defaults selected by UNICODE build flag), utf-16 or ucs2. It is not as if we need to adapt or handle more things, quite the opposite. It's one of the things that coercing utf-8-ness is about. The ANSI pages being one of those things exactly to prevent. The -A apis currently return ANSI coded strings, which simply do not function in OBS if they happen to contain anything non-ASCII.

std::string and char are byte containers with programmer hints about using it for text. They support utf-8 as well as any other encoding with multibyte bytes. Things like strlen will tell the size in bytes, which is usually good for example allocating arrays and when you iterate over them you are often times looking for something that is in ASCII anyway (that is compatible with utf-8 codepoints).

Some after thoughts

The need for this started from trying to get locale-awarness to C, for sorting specifically. Locale availability on C++ or Qt could probably be leveraged with some externs to C, true. Or a signaling mechanism in case module encapsulation paradigms would fight against direct usage, though it is a round about way for something relatively low level.

For commandline arguments I don't see a nice way to fix them without the also -A API changing manifest. I suppose we could use MultibyteToWideChar(), but with CP_ACP (Current code page, OS default if not changed) instead of CP_UTF8 flag and then follow with WideCharToMultibyte with the UTF-8 flag. Should work, but I don't think that type thing should become any sort of standard.

I might look into this a bit more from broader, design point of view too later. Any decision is fine though.

@Dankirk
Copy link
Author

Dankirk commented Feb 3, 2026

Continuing a bit here.

Mind you, I'm not saying that we shouldn't adopt changes like these, but that these changes require a great deal of thought and need to address many architectural design flaws in OBS as it is right now. Simply adding it in bits and pieces runs the risk of fixing symptoms but not fixing the core issue(s), particularly as the entire app has not been designed to properly handle locale's or anything beyond ASCII text and making it handle text "right" in one area might collapse a whole house of cards of assumptions about character data in another.

I do not share the sentiment about the the kind of architectural flaws present here. Reasoning being that this is already being done on non-Windows systems and the research done as described to introduce this PR. The parts of code that supposedly have a problem with non-ASCII data have that now and will, without this update, still be fed non-ASCII data if they are used as such in what ever encoding. In short, parts that don't work after this, never have, and this will not make fixing them more difficult. We treat all non utf-16/32 text as if it were utf-8 encoded already and use proper conversion functions to convert between that and target. Any other encoding text might be in is not supported by our conversion functions, so we should do everything we can to coerce external strings, and internal functions to work with the assumption the text is indeed what we believe, utf-8. While we can and have played around with ASCII, the actual data has always been what it is, it has never actually been limited to codepoints representable in ASCII.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Enhancement Improvement to existing functionality

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OBS crashes if filename contains german Umlaut Months and weekdays are not localized in filename formatting

4 participants