Skip to content

fix(uucore): Use embedded locales in release mode.#9879

Draft
lordeji wants to merge 2 commits intouutils:mainfrom
lordeji:actually-use-embedded-locale
Draft

fix(uucore): Use embedded locales in release mode.#9879
lordeji wants to merge 2 commits intouutils:mainfrom
lordeji:actually-use-embedded-locale

Conversation

@lordeji
Copy link

@lordeji lordeji commented Dec 27, 2025

TL;DR

PR #8604 was incomplete. It was indeed embedding the locales but they weren't used and get_locales_dir() implementation was still looking for a path which throw an error in setup_localization() which force to fallback to english, even if the locales were really embedded.


Implementation :

I mainly reformated setup_localization() and init_localization() to have implementations specific to debug and release.
From what I gathered from the codebase, in debug we look for the .ftl files (and eventually fallback to embedded english) and in release it seems that there was a sort of legacy version that was looking for a folder relative to the executable (which was the source of the error).
The debug implementations is mainly the same, the release implementations follows the same structure except that we get the localized text from get_embedded_locale().
The implementation comes from the preexisting function create_english_bundle_from_embedded() where I replaced the "en-US" identifier to a parameter one.


Testing :

For consistency, I tested on 5 packages for each configuration (cp, split, truncate, mv and ls).
I was also using the devcontainer provided.
I compiled :

cargo build -p <packages...>
cargo build --release -p <packages...>
cargo build
cargo build --release

With no error/warning and then ran :

export LANG=fr_FR.UTF-8
./target/debug/<packages...> --help
./target/debug/coreutils <packages...> --help
./target/release/<packages...> --help
./target/release/coreutils <packages...> --help

Which would print French help text for all the packages in all mode.
I then ran the tests :

cargo nextest run --all-features -p uucore --no-fail-fast

With result : 313 tests run: 312 passed, 1 failed, 2 skipped

The only failed test is uucore features::proc_info::tests::test_pid_entry but it also fails when testing in main. Maybe the error is related to the devcontainer.


Edits :

  • While looking at the workflows, I saw that the script for "Test Make installation" step (in l10n_installation_test) was not set properly. Locale should be set before building or the embedding won't be done.
  • Since l10n_installation_test was still throwing an error, I looked back and saw I forgot to update the "Test Cargo installation" step that did not even have locales set.

@github-actions
Copy link

GNU testsuite comparison:

GNU test failed: tests/cp/cp-mv-enotsup-xattr. tests/cp/cp-mv-enotsup-xattr is passing on 'main'. Maybe you have to rebase?
Note: The gnu test tests/csplit/csplit-io-err was skipped on 'main' but is now failing.

@codspeed-hq
Copy link

codspeed-hq bot commented Dec 27, 2025

Merging this PR will improve performance by 13.21%

⚡ 17 improved benchmarks
✅ 125 untouched benchmarks
⏩ 180 skipped benchmarks1

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation wc_bytes_synthetic[500] 191.5 µs 175 µs +9.41%
Simulation b64_decode_synthetic 171.2 µs 155.9 µs +9.83%
Simulation b64_decode_ignore_garbage_synthetic 168.2 µs 155.1 µs +8.49%
Simulation b64_encode_synthetic 166.9 µs 152.4 µs +9.54%
Simulation cksum_blake3 216 µs 201.3 µs +7.3%
Simulation cp_large_file[16] 379.1 µs 365.8 µs +3.64%
Simulation complex_relative_date 246.2 µs 230.6 µs +6.75%
Simulation single_date_now 190.4 µs 175.4 µs +8.55%
Simulation dd_copy_partial 564.4 µs 543.5 µs +3.85%
Simulation dd_copy_64k_blocks 606.5 µs 587.6 µs +3.21%
Simulation mv_single_file 138.6 ms 130.5 ms +6.28%
Simulation mv_force_overwrite 147.9 ms 130.6 ms +13.21%
Simulation rm_single_file 123.8 ms 114.9 ms +7.67%
Simulation split_number_chunks 286.6 µs 276.6 µs +3.6%
Simulation df_deep_directory 381 µs 366.8 µs +3.87%
Simulation df_with_path 405.1 µs 388.8 µs +4.19%
Simulation factor_multiple_u64s[2] 223.6 ms 199.8 ms +11.92%

Comparing lordeji:actually-use-embedded-locale (0baf12c) with main (6f955da)

Open in CodSpeed

Footnotes

  1. 180 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@oech3
Copy link
Contributor

oech3 commented Dec 28, 2025

Update makefile to not install locales in release because they're useless.

People installing from source does not require all locales, but distibutor still needs all locales.

@github-actions
Copy link

GNU testsuite comparison:

GNU test failed: tests/cp/cp-mv-enotsup-xattr. tests/cp/cp-mv-enotsup-xattr is passing on 'main'. Maybe you have to rebase?
Note: The gnu test tests/csplit/csplit-io-err was skipped on 'main' but is now failing.

@lordeji lordeji force-pushed the actually-use-embedded-locale branch from 7b12e54 to 8b1d9f6 Compare December 28, 2025 12:52
@lordeji
Copy link
Author

lordeji commented Dec 28, 2025

@oech3 Sorry, didn't know this. Rebased without the GNUmakefile commit.

@oech3
Copy link
Contributor

oech3 commented Dec 28, 2025

Is non-embedded .ftl still supported at here?

@github-actions
Copy link

GNU testsuite comparison:

Congrats! The gnu test tests/tail/assert is no longer failing!

@lordeji
Copy link
Author

lordeji commented Dec 28, 2025

@oech3 Not in release mode.
I may have misinterpreted the code intention to only use embedded locales in release mode.
If needed, i can fix the original get_locales_dir() function that was throwing an error in release and then add fallback to .ftl files.

@oech3
Copy link
Contributor

oech3 commented Dec 28, 2025 via email

@lordeji
Copy link
Author

lordeji commented Dec 28, 2025

@oech3 I'm truly sorry for the confusion but it seems that there is conflicting ideas between you and PR #8604.
It says :

It embeds the corresponding locale file if it exists, in addition to the mandatory English (en-US) fallback.

But embedded (non english) locales were not programmed to be used, making me think there was a bug.
There is surely something I'm missing but why embedding system locale if it's meant only for English ?

@oech3
Copy link
Contributor

oech3 commented Dec 28, 2025

why embedding system locale if it's meant only for English ?

For performance ? cc: @sylvestre @WaterWhisperer

@oech3
Copy link
Contributor

oech3 commented Dec 28, 2025

If fallback is not supported, I think it is bug (at least for packager).

@oech3
Copy link
Contributor

oech3 commented Dec 28, 2025

$ strace -e trace=openat -- /bin/true
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/usr/lib/libonig.so.5", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/usr/lib/libgcc_s.so.1", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/usr/lib/libm.so.6", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/usr/lib/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/proc/self/maps", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/usr/bin/true/en-US.ftl", O_RDONLY|O_CLOEXEC) = -1 ENOTDIR (ディレクトリではありません)
openat(AT_FDCWD, "/usr/bin/true/ja-JP.ftl", O_RDONLY|O_CLOEXEC) = -1 ENOTDIR (ディレクトリではありません)
openat(AT_FDCWD, "/usr/bin/true/en-US.ftl", O_RDONLY|O_CLOEXEC) = -1 ENOTDIR (ディレクトリではありません)
openat(AT_FDCWD, "/usr/bin/true/ja-JP.ftl", O_RDONLY|O_CLOEXEC) = -1 ENOTDIR (ディレクトリではありません)
+++ exited with 0 +++

@github-actions
Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/misc/tee (passes in this run but fails in the 'main' branch)
Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)
Congrats! The gnu test tests/tty/tty-eof is no longer failing!

@lordeji lordeji force-pushed the actually-use-embedded-locale branch from 396f6b3 to b4b55d0 Compare December 29, 2025 01:35
@github-actions
Copy link

GNU testsuite comparison:

Note: The gnu test tests/id/smack was skipped on 'main' but is now failing.
Note: The gnu test tests/mkdir/smack-no-root was skipped on 'main' but is now failing.
Note: The gnu test tests/mkdir/smack-root was skipped on 'main' but is now failing.

@lordeji lordeji force-pushed the actually-use-embedded-locale branch from 497bd5b to 2e2305e Compare December 29, 2025 19:09
@sylvestre
Copy link
Contributor

would it be possible to add a github action check to verify that it works correctly? thanks

@sylvestre
Copy link
Contributor

also, i worried about the binary size with all locales, did you look at this ?

@oech3
Copy link
Contributor

oech3 commented Dec 31, 2025

In my understanding, we fallbacks to nun-embedded ftl instead of embedding all langs to the binary. No?

@sylvestre
Copy link
Contributor

looking at comment #0 it isn't clear to me :)

@lordeji
Copy link
Author

lordeji commented Jan 1, 2026

@sylvestre @oech3 For the binary size, it shouldn't change anything because PR #8604 was already embedding them. It was the first thing I checked when debugging. OUT_DIR/embedded_locales.rs had both english and french locales, but only english were used.
I did not explicitely compare, I will check size difference when I can.
I will also add a github action, just need to learn of it works before so it may takes a bit more time.

I did not fallback to .ftl files because I thought that, for performance, in release the goal was to embed all the locales. It's my fault, I misinterpreted the PR and the documentation was only talking about english which was contradicting the embedding of french locales and I thought it wasn't updated.
I will update my PR with fallback to .ftl files but this time try embedded before so that it doesn't error in get_locales_dir() and always fallback to embedded english.

Before making the changes, if that's not too much to ask, could someone write my the EXACT intent of how you want it to works ? There is no real documentation and when I tried cargo install --path . --root ./build no locales were built so I assume it's only a GNUmakefile option (with install-locales). I need these informations :

  1. Looking at the GNUmakefile, locales are installed in $(DATAROOTDIR)/locales/<pkg> with a default of $(PREFIX)/share. I see that resolve_locales_dir_from_exe_dir() is trying to handle 3 cases : <bindir>/locales/<prog> / <prefix>/share/locales/<prog> / <bindir>/<prog>. Are these 3 cases enough and I should just revert the deletion or should I check more edgecases ? I was thinking that we could call cargo with DATAROOTDIR as an local env variable and then get it at compile time with const DATAROOTDIR: &str = env!("DATAROOTDIR");
  2. In install-locales target I see that we do not install english locales (if [ "$$(basename "$$locale_file")" != "en-US.ftl" ];) should this behavior be extended to detected system locale ?

Sorry for the long message but I'm on holidays and I won't have much time to code or answer so the quicker we can resolve misunderstandings, the more efficient I will be.

Happy new years by the way !

@sylvestre
Copy link
Contributor

Before making the changes, if that's not too much to ask, could someone write my the EXACT intent of how you want it to works ?

Basically:

  • embedded English (ie the English flt files should not even be installed)
  • if the locale isn't English, we should load the appropriate ftl files (and should be installed)

is that clear?

@lordeji
Copy link
Author

lordeji commented Jan 7, 2026

  • if the locale isn't English, we should load the appropriate ftl files (and should be installed)

So, why PR #8604 was merged ? It specifically add support for embedding non english locales.
If it's not the intended use, the PR should be reverted as it embeds locales that are not used and increase the program size.
Also, if the PR is indeed reverted then this one will be closed.

@lordeji
Copy link
Author

lordeji commented Jan 13, 2026

@sylvestre really sorry for pinging but I think that we really need to resolve the misunderstanding.

@sylvestre
Copy link
Contributor

sorry, i need to spend time on it and i didn't find time to work on it.

However, happy to get help and more investigations on this :)
thanks

@lordeji
Copy link
Author

lordeji commented Jan 14, 2026

@sylvestre Looking back on issue #8594, your instructions were :

We probably want to update this code:
main/src/uucore/build.rs#L216
to detect the local of the system and embedded both english (for fallback) and the user translation locales.

I'm assuming that you changed your mind because the main problem now is the issue #9103. If so, I'll close this PR.
If that's not the case and if you're still okay with your initial idea, I'll rework the PR to fallback to fluent files if no embedded locales are found.

@oech3 oech3 mentioned this pull request Jan 29, 2026
- Update setup_localization() to have implementation specific to debug or release.
- Update create_english_bundle_from_embedded() to not be specific to english.
- Update init_localization() to have implementations specific to debug or release.
- Delete get_locales_dir() release mode implementation because it was useless.
In "Test Make installation" step, set french locale
BEFORE building with make.
In "Test Cargo installation" step, add french locale
env var before installation.
If not, the binary is built with no embedded french locale.
@sylvestre sylvestre force-pushed the actually-use-embedded-locale branch from 2e2305e to 0baf12c Compare January 29, 2026 07:22
@oech3
Copy link
Contributor

oech3 commented Jan 29, 2026

Should we embedded Eng+1 system lang? It is bit complex to maintain.

@sylvestre
Copy link
Contributor

Sorry, what means eng+1? :)

@oech3
Copy link
Contributor

oech3 commented Jan 29, 2026

eng + 1 where 1 is set by LANG=.

@github-actions
Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/shuf/shuf-reservoir (passes in this run but fails in the 'main' branch)
Skipping an intermittent issue tests/sort/sort-stale-thread-mem (passes in this run but fails in the 'main' branch)
Skipping an intermittent issue tests/tail/inotify-dir-recreate (passes in this run but fails in the 'main' branch)
Congrats! The gnu test tests/basenc/bounded-memory is now passing!

@lordeji
Copy link
Author

lordeji commented Jan 29, 2026

@oech3 I know this is a weird question but i'm new to contributing to open-source and I'm far from having all the required knowledge.

It's just that I see you being there for l10n and i18n related issues so I wondered if you're a maintainer specialized in this ?

If so, do you have more insight on this misunderstanding ?
Would you recommend me to still work on this PR even if we're not sure of the goal or is it better to wait confirmation ?

Again, really sorry for the message, it's just that I dont know what to do and you seems the one to know the best about the l10n system.

@oech3
Copy link
Contributor

oech3 commented Jan 29, 2026

Sorry, I'm not a maintainer and collaborator. I can't decide what should we do.

@lordeji
Copy link
Author

lordeji commented Jan 29, 2026

@oech3 ok ok, really sorry for bothering you, i'm juste a bit lost...

@oech3
Copy link
Contributor

oech3 commented Jan 29, 2026

I am confusing too.

@sylvestre
Copy link
Contributor

Me too :)
I am not sure what to do here yet

@lordeji
Copy link
Author

lordeji commented Jan 29, 2026

@sylvestre There is 2 possibilities :

  1. You're still okay with your earlier statement (in Issue "cargo install" only build in English #8594) so we embed english and system LANG :
    If so, I'll finish this PR by adding back .ftl files as backup

  2. You disagree with your earlier statement so we only embed english :
    If so,


In my opinion, we should solve how packagers/distribution could make use of this because embedding the system locale can only be done at build time and would lead to multiple packages (like uutils-fr).
This could be solved by embedding all locales in the binary but it would bloat the size.
Embedding is the best performance-wise but with much bigger binary and keeping the .flt files would lead to slower initialization thus aggravating Issue #9103.
Also, embedding the locales force to update uutils completely when we want to update only the locale files.

For now, in this configuration, I think embedding system locale is not useful until we find a way for the packagers/distribution to make use of it.

The choice is yours Mr LEDRU.

@oech3
Copy link
Contributor

oech3 commented Jan 30, 2026

@sylvestre If you want to support reproducible build not depending on system config, please allow doing 2.
If not, please allow doing 1.

@lordeji
Copy link
Author

lordeji commented Feb 7, 2026

I'm converting all my PRs to drafts.
Reason : I'll work on it after solving my health problems.

@lordeji lordeji marked this pull request as draft February 7, 2026 13:22
@cakebaker
Copy link
Contributor

I'm converting all my PRs to drafts.
Reason : I'll work on it after solving my health problems.

@lordeji get well soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants