Skip to content

ICU-23321 Eliminate lock contention and fix concurrency bugs across ICU4J#3902

Merged
markusicu merged 16 commits intounicode-org:mainfrom
jabagawee:ICU-23321-batch2
Mar 21, 2026
Merged

ICU-23321 Eliminate lock contention and fix concurrency bugs across ICU4J#3902
markusicu merged 16 commits intounicode-org:mainfrom
jabagawee:ICU-23321-batch2

Conversation

@jabagawee
Copy link
Copy Markdown
Contributor

@jabagawee jabagawee commented Mar 14, 2026

Continuation of the ICU-23321 concurrency modernization effort (batch 1: #3863, ResourceCache: #3900). This PR addresses thread-safety issues across ~30 files in core, collate, translit, and charset modules.

Scope

16 commits organized by pattern:

Correctness fixes (bugs in existing code):

  • DateTimePatternGenerator.cloneAsThawed() mutated the frozen original (shared singleton) instead of the clone. Also fixes the test that was asserting the buggy behavior.
  • RelativeDateTimeFormatter.formatImpl() called numberFormat.format() without synchronization on a non-thread-safe NumberFormat instance.
  • StandardPluralRanges.getLanguageToSet() had a broken double-checked locking pattern (non-volatile field). Replaced with holder pattern.
  • CollationTailoring.maxExpansions was written without volatile, so concurrent readers could see a stale null.
  • ResourceBundleWrapper.initKeysVector() could be called concurrently on shared parent bundles, causing ArrayIndexOutOfBoundsException from concurrent ArrayList.add(). Reported via BIRT 4.22 Export PDF fails in multi threaded environment when running many reports first time eclipse-birt/birt#2394

Lock contention reduction:

  • Currency: replaced synchronized class-level monitor + SoftReference with holder pattern and List.copyOf/Set.copyOf.
  • MeasureUnit: replaced global synchronized(MeasureUnit.class) with ConcurrentHashMap + per-type locking.
  • ULocale.getDefault(Category): eliminated synchronized(ULocale.class) on every call via volatile snapshot (DefaultCategoryState).
  • ULocale.loadAliasData(): replaced synchronized with holder pattern (JLS 12.4.2).
  • TimeZone: replaced double-lock (java.util.TimeZone.class + TimeZone.class) with a private lock object. Made TZ_IMPL volatile.

Synchronized caches to ConcurrentHashMap (7 files):

  • RbnfScannerProviderImpl, PluralRulesLoader, LocaleObjectCache, MeasureFormat, NormalizationTransliterator, ZoneMeta (partial)
  • Each cache converted from synchronized block to lock-free reads via CHM's get/putIfAbsent or computeIfAbsent.

Volatile DCL and holder pattern for one-time init (7 files):

  • Region, UConverterAlias, SimpleDateFormat, TimeZoneFormat, ZoneMeta, LocaleDisplayNamesImpl: volatile flag + DCL
  • CharsetProviderICU: holder pattern (JAR-bundled resources, permanent failure acceptable)
  • Replaced static synchronized methods, eliminating contention after first init.

AtomicReferenceArray for indexed lazy init (2 files):

  • CharacterProperties, StringPrep: replaced synchronized on arrays with AtomicReferenceArray for lock-free reads.

Missing-synchronization audit fixes:

  • UScriptRun.parenStack: static array shared across all instances; made it an instance field.
  • CharsetISCII PNJ sets: unsynchronized lazy init; converted to holder pattern with frozen sets.
  • UCharacter.UnicodeBlock.mref, BasicPeriodFormatterService.instance, LocaleData.gCLDRVersion, UnicodeSet.XSYMBOL_TABLE: added volatile or converted to holder pattern for JMM correctness.

Other:

  • Transliterators (5 files): made shared SourceTargetUtility fields volatile for safe publication.
  • SimpleCache: made type and capacity fields final; added concurrency tests.
  • PluralRulesLoader: fixed sentinel value collision (was using PluralRules.DEFAULT, which newInternal() can return by reference).

Testing

  • All existing tests pass across core, collate, translit, and charset modules (0 failures, 0 errors).
  • New concurrency regression tests for each fix category:
    • CorrectnessFixConcurrencyTest: DTPG cloneAsThawed, RDTF shared-instance formatting, StandardPluralRanges, UScriptRun
    • CollationConcurrencyTest: exercises initMaxExpansions() via getCollationElementIterator
    • TransliteratorConcurrencyTest: concurrent addSourceTargetSet calls
    • LazyInitConcurrencyTest: PluralRules, MeasureFormat, Region, SimpleDateFormat, LocaleDisplayNames, CharacterProperties, StringPrep, ResourceBundleWrapper
    • CacheConcurrencyTest: concurrent SimpleCache operations
    • ULocaleDefaultConcurrencyTest: concurrent getDefault(Category) with setDefault
    • TimeZoneConcurrencyTest: concurrent getDefault/setDefault

Checklist

  • Required: Issue filed: ICU-23321
  • Required: The PR title must be prefixed with a JIRA Issue number. Example: "ICU-NNNNN Fix xyz"
  • Required: Each commit message must be prefixed with a JIRA Issue number. Example: "ICU-NNNNN Fix xyz"
  • Issue accepted (done by Technical Committee after discussion)
  • Tests included, if applicable
  • API docs and/or User Guide docs changed or added, if applicable
  • Approver: Feel free to merge on my behalf

ALLOW_MANY_COMMITS=true

Copy link
Copy Markdown
Contributor

@mihnita mihnita left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you.

richgillam
richgillam previously approved these changes Mar 20, 2026
Copy link
Copy Markdown
Contributor

@richgillam richgillam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you should check in on my say-so alone; my Java is really rusty at this point, and I only skimmed parts of the review. But this all looks like good stuff, carefully thought out and very well documented.

Copy link
Copy Markdown
Member

@markusicu markusicu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • this is great work, thank you!
  • also great documentation, including in the commit messages
  • thus, i wonder whether we should keep (most of) these commits separately (rebase & merge multiple commits)? (if so, then we can add a secret incantation to satisfy the CI check)
  • optional: except maybe squashing some fix-ups of earlier commits into the respective ones?
  • i like the separate, late spotless commit
  • i did not try to read & understand absolutely every line...
  • i do have a couple of questions/suggestions

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same as what you added for PR #3900, right? Meaning, it is now in conflict? Meaning, once these changes are approved, you will need to rebase & resolve the conflict...?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup i'll rebase and resolve after you give final approval

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your follow-up changes. I think this is all good. Please go ahead and rebase/absorb/...

mihnita
mihnita previously approved these changes Mar 20, 2026
@jabagawee
Copy link
Copy Markdown
Contributor Author

thus, i wonder whether we should keep (most of) these commits separately (rebase & merge multiple commits)? (if so, then we can add a secret incantation to satisfy the CI check)

optional: except maybe squashing some fix-ups of earlier commits into the respective ones?

yeah what i can do is to update the commits that are actually doing things in order to "absorb" the spotless changes and extra fixes at the end (idk if that's the standard name for it, but i'm pulling from the idea of hg absorb in mercurial). then we can commit the changes as individual commits without the noisy commits included

i like the separate, late spotless commit

if that's the case, should i not absorb the spotless commit?

@jabagawee jabagawee requested a review from markusicu March 20, 2026 20:10
@markusicu
Copy link
Copy Markdown
Member

thus, i wonder whether we should keep (most of) these commits separately (rebase & merge multiple commits)? (if so, then we can add a secret incantation to satisfy the CI check)
optional: except maybe squashing some fix-ups of earlier commits into the respective ones?

yeah what i can do is to update the commits that are actually doing things in order to "absorb" the spotless changes and extra fixes at the end (idk if that's the standard name for it, but i'm pulling from the idea of hg absorb in mercurial). then we can commit the changes as individual commits without the noisy commits included

Absorbing fixes on earlier-commit changes would be very nice.

It's probably cleanest to also have each commit include its own spotless formatting (GitHub reviews allow ignoring whitespace) but I don't feel strongly either way.

As for "absorbing" / hg, the tool I have used for this sort of stuff is git rebase -i for editing / shuffling / selectively squashing commits. I am sure you have something better :-)

@jabagawee
Copy link
Copy Markdown
Contributor Author

just to confirm, are you cool with me rebasing on the latest main and also editing commits and force pushing now? or should i wait for other people to review this first?

@markusicu
Copy link
Copy Markdown
Member

just to confirm, are you cool with me rebasing on the latest main and also editing commits and force pushing now? or should i wait for other people to review this first?

I propose that you do so now. We have had three people look this over, with few comments. Let's move it along :-)

Both the conflict-resolving rebase and the other commit twiddles will confuse GitHub comment tracking. Seems ok to rip off the band-aid once.

… the frozen original

cloneAsThawed() set `frozen = false` on `this` instead of on the
clone. When a frozen instance was shared across threads (the common
case: getFrozenInstance() returns a cached singleton), any call to
cloneAsThawed() would silently unfreeze the shared instance, allowing
other threads to mutate it concurrently.

This dates back to the very first checkin of DateTimePatternGenerator
(ICU-4374, 2006-08-17, svn2github/icu4j@8ffe84bc5). Compare with
UnicodeSet.cloneAsThawed(), written two days later (ICU-4217,
2006-08-19), which correctly writes `result.frozen = false`. The
DTPG version likely carried over the pattern without the `result.`
qualifier.

Fix: change `frozen = false` to `result.frozen = false` so the
intent is explicit. clone() already sets result.frozen = false
(line 1780), so this is technically redundant but makes the contract
("cloneAsThawed returns a thawed clone without modifying the
original") self-documenting at the call site.

Update TestFreezeAndCloneAsThawed to assert the correct behavior:
the original must remain frozen after cloneAsThawed(). The existing
test was asserting the buggy behavior.

Reference: Effective Java, 3rd Edition, Item 17 "Minimize mutability"
…ormatImpl()

formatImpl() has two branches: the DecimalFormat branch (line 774)
is thread-safe because toNumberFormatter() returns an immutable
LocalizedNumberFormatter. But the else-branch calls
numberFormat.format() directly without synchronization. NumberFormat
is not thread-safe: concurrent format() calls can corrupt internal
state (DigitList, FieldPosition).

Fix: wrap the else-branch's numberFormat.format() call in
synchronized(numberFormat).

Reference: Java Concurrency in Practice, Section 4.3.5
"Documenting synchronization policies"
@jira-pull-request-webhook
Copy link
Copy Markdown

Notice: the branch changed across the force-push!

  • icu4j/main/charset/src/main/java/com/ibm/icu/charset/CharsetProviderICU.java is different
  • icu4j/main/core/src/main/java/com/ibm/icu/impl/duration/BasicPeriodFormatterService.java is different
  • icu4j/main/core/src/main/java/com/ibm/icu/impl/number/range/StandardPluralRanges.java is different
  • icu4j/main/core/src/main/java/com/ibm/icu/text/SimpleDateFormat.java is different
  • icu4j/main/core/src/main/java/com/ibm/icu/util/MeasureUnit.java is different
  • icu4j/main/core/src/test/java/com/ibm/icu/dev/test/format/DateTimeGeneratorTest.java is different
  • icu4j/main/core/src/test/java/com/ibm/icu/dev/test/util/ConcurrencyTest.java is no longer changed in the branch
  • icu4j/main/core/src/test/java/com/ibm/icu/dev/test/util/LazyInitConcurrencyTest.java is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@jabagawee
Copy link
Copy Markdown
Contributor Author

oh no there are tons of changes due to the rebase on main. i was hoping to show that the only changes i made in my rebase were:

  • renamed holder pattern fields from FooHolder.INSTANCE to FooHolder.FOO
  • reordered the test cases in LazyInitConcurrencyTest because as i rebased i was appending tests in the order that their commits came

but other than that yup this is my finalized branch that i'd love to submit as separate commits

@jabagawee jabagawee changed the title ICU-23321 Batch 2: broad concurrency fixes across ICU4J ICU-23321 Eliminate lock contention and fix concurrency bugs across ICU4J Mar 21, 2026
getLanguageToSet() used volatile without a synchronized block. Two
threads could both see null, both construct the map, and a reader
could see the reference before the HashMap's internal state was
fully visible.

Fix: initialization-on-demand holder pattern (JLS 12.4.2). The data
is loaded from JAR-bundled ICU resources that cannot fail transiently,
so permanent class initialization failure is acceptable.

Reference: Effective Java, 3rd Edition, Item 83 "Use lazy
initialization judiciously"
…threads

Two issues in the collation maxExpansions lazy initialization:

1. CollationTailoring.maxExpansions is not volatile. It is written
   inside synchronized(tailoring) in RuleBasedCollator.initMaxExpansions()
   but read without synchronization by getMaxExpansion(). Without
   volatile, reader threads may never see the updated value.

2. RuleBasedCollator.initMaxExpansions() always enters the
   synchronized block even when maxExpansions is already initialized.
   Add a volatile read as a fast-path check before acquiring the lock.

Fix: make maxExpansions volatile and add a null-check before the
synchronized block (double-checked locking).

Reference: Java Concurrency in Practice, Section 3.1.3
"Locking and visibility"
Five related safe-publication fixes in the transliterator subsystem:

1. CaseFoldTransliterator: sourceTargetUtility is lazily initialized
   without volatile; readers can see a partially-constructed object.
   Also corrects the lock class: was synchronized on
   UppercaseTransliterator.class instead of CaseFoldTransliterator.class,
   carried over from ticket:8227 (svn2github/icu4j@eb34af63f, 2010)
   when addSourceTargetSet() was added to all case transliterators
   using UppercaseTransliterator as the template.
   Fix: add volatile, use correct lock class, add double-checked locking.

2. Lowercase/Uppercase/TitlecaseTransliterator: sourceTargetUtility
   fields are lazily initialized without volatile. The existing
   synchronized block in addSourceTargetSet() provides mutual exclusion
   but without volatile, the fast-path read outside the lock can see
   a stale null or a partially-constructed object.
   Fix: add volatile to each field.

3. TransliteratorParser: ILLEGAL_TOP, ILLEGAL_SEG, ILLEGAL_FUNC are
   static UnicodeSet fields that are neither final nor frozen. Mutable
   static fields shared across threads without synchronization.
   Fix: make final and freeze() each UnicodeSet for immutability.

Reference: Java Concurrency in Practice, Section 3.5 "Safe publication"
SimpleCache.type and SimpleCache.capacity are assigned once in the
constructor and never modified. Make them final to signal immutability
and enable the JMM's final field freeze guarantee for safe publication.

Also chain constructors to eliminate redundant default-value logic.

Reference: Effective Java, 3rd Edition, Item 17 "Minimize mutability"
These tests exercise concurrent access to SimpleCache (via
NumberFormat.getInstance and getCurrencyInstance). They should have
been included in the batch 1 PR (unicode-org#3863) which replaced
Collections.synchronizedMap with ConcurrentHashMap, but were
inadvertently omitted.
populateCache() is synchronized on MeasureUnit.class. Even after the
cache is populated, every call to getAvailableTypes(), getAvailable(),
findBySubType(), and getUnit() acquires this monitor just to check a
boolean flag. At scale with many threads, this single monitor becomes
a bottleneck.

Fix: replace HashMap with ConcurrentHashMap (both outer and inner
maps), make cacheIsPopulated volatile, use double-checked locking.
After initialization, all reads are lock-free.

The outer map must also be ConcurrentHashMap because addUnit() is
called at runtime via Currency.getInstance() -> internalGetInstance().

References:
- Effective Java, 3rd Edition, Item 83 "Use lazy initialization
  judiciously"
- Java Concurrency in Practice, Section 16.2.4 "Double-checked
  locking"
getAllTenderCurrencies() and getAllCurrenciesAsSet() are private
static synchronized. Every call acquires Currency.class just to
check if a SoftReference is still live. Called from isAvailable()
and getAvailableCurrencies(), these become bottlenecks when many
threads validate currencies concurrently.

Fix: use initialization-on-demand holder pattern for tender
currencies and all-currencies sets. The JVM guarantees thread-safe
lazy initialization of static inner class fields (JLS 12.4.2),
eliminating the need for any synchronization.

Reference: Effective Java, 3rd Edition, Item 83 "Use lazy
initialization judiciously"
getDefault(Category) acquires synchronized(ULocale.class) on every
call. This method is called from virtually every formatter factory:
NumberFormat.getInstance(), DateFormat.getDateInstance(),
MessageFormat constructor, and 20+ more. At scale, every thread
creating any formatter serializes on this one lock just to read an
array element.

Fix: use volatile arrays with copy-on-write semantics. The common
path (JDK default locale unchanged) is a volatile read + equals()
check with no lock acquisition. The rare path (locale changed)
takes the lock and does copy-on-write array replacement.

References:
- Effective Java, 3rd Edition, Item 83 "Use lazy initialization
  judiciously"
- Java Concurrency in Practice, Section 3.4.2 "Volatile variables"
…attern

AliasReplacer.loadAliasData() is private static synchronized. After
first call, every subsequent call acquires AliasReplacer.class just
to check aliasDataIsLoaded == true. This is on the canonicalize()
hot path.

Fix: initialization-on-demand holder pattern. Move all alias data
into a static inner class AliasData whose fields are populated in
the class initializer. The JVM guarantees thread-safe lazy
initialization of static inner class fields (JLS 12.4.2),
eliminating the need for any synchronization.

Reference: Effective Java, 3rd Edition, Item 83 "Use lazy
initialization judiciously"
…ject

Two issues:

1. TZ_IMPL is a private static int (not volatile). Written by
   setDefaultTimeZoneType() inside synchronized(TimeZone.class), but
   read by getDefaultTimeZoneType() without synchronization. Without
   volatile, reader threads may never see the updated value: a thread
   that set the timezone type to JDK could have other threads continue
   using ICU type indefinitely.

   Fix: make TZ_IMPL volatile.

2. defaultZone synchronization uses TimeZone.class as the monitor.
   This is a public class: any external code can synchronize on
   TimeZone.class and cause unexpected contention or deadlock.

   Fix: introduce a private static final lock object.

References:
- Java Concurrency in Practice, Section 3.1.3 "Locking and
  visibility"
- Effective Java, 3rd Edition, Item 82 "Document thread safety"
Replace Collections.synchronizedMap and synchronized-HashMap patterns
with ConcurrentHashMap for lock-free reads across 7 files:

- NormalizationTransliterator: cache (synchronizedMap -> CHM).
  Uses get-then-putIfAbsent instead of computeIfAbsent because
  SourceTargetUtility construction scans all Unicode code points
  and should not hold a CHM bin lock during that work.

- TransliteratorIDParser: SPECIALS (synchronizedMap -> CHM)

- Transliterator: displayNameCache (synchronizedMap -> CHM)

- RbnfScannerProviderImpl: cache (HashMap+synchronized -> CHM).
  Uses get-then-putIfAbsent because createScanner() constructs a
  RuleBasedCollator (expensive) and should not hold a bin lock.

- PluralRulesLoader: LOCALE_RULES_MAP (HashMap+synchronized -> CHM)

- LocaleObjectCache: _map (HashMap+synchronized -> CHM)

- MeasureFormat: replaces 3-entry MRU cache (formatter1/2/3) with
  ConcurrentHashMap for lock-free reads.

computeIfAbsent vs get-then-putIfAbsent: computeIfAbsent holds a
CHM bin lock for the duration of the lambda. This is safe when the
factory is cheap and does not re-enter the same map. When the factory
is expensive or could re-enter, get-then-putIfAbsent avoids holding
the lock at the cost of occasional duplicate computation.

References:
- Java Concurrency in Practice, Section 5.2 "ConcurrentHashMap"
- Effective Java, 3rd Edition, Item 81 "Prefer concurrency utilities
  to wait and notify"
Replace synchronized methods that guard one-time lazy initialization
with volatile boolean + double-checked locking. After first init,
the fast path is a single volatile read:

- ZoneMeta: volatile DCL for zoneIDs, CANONICAL_ID_CACHE, and other
  SoftReference-wrapped caches

- Region: volatile DCL for loadRegionData()

- SimpleDateFormat: replace two separate volatile fields (locale,
  pattern) with an immutable snapshot object published via a single
  volatile write. This eliminates a TOCTOU race where the two fields
  could be read inconsistently by concurrent threads.

- CharsetProviderICU: holder pattern for loadAvailableICUCharsets().
  Data is loaded from JAR-bundled ICU resources, so permanent failure
  on missing data is acceptable.

- UConverterAlias: volatile DCL for haveAliasData()

- TimeZoneFormat: volatile DCL for timezone name cache

- LocaleDisplayNamesImpl: volatile DCL for locale data cache

References:
- Effective Java, 3rd Edition, Item 83 "Use lazy initialization
  judiciously"
- Java Concurrency in Practice, Section 16.2.4 "Double-checked
  locking"
Replace synchronized array access with AtomicReferenceArray for
lock-free reads of lazily-initialized indexed data:

- CharacterProperties: sets/maps arrays -> AtomicReferenceArray
  (renamed to SETS/MAPS per static final convention). Each UProperty
  index is independently initialized via compareAndSet, eliminating
  the global lock on every property lookup.

- StringPrep: CACHE array -> AtomicReferenceArray.
  Provides volatile-read semantics on the fast path (outside the
  synchronized block) for the double-checked locking pattern. The
  synchronized block is still used for initialization to avoid
  duplicate expensive resource loading.

References:
- Java Concurrency in Practice, Section 15.3 "Atomic variable classes"
- Effective Java, 3rd Edition, Item 83 "Use lazy initialization
  judiciously"
…udit

- ResourceBundleWrapper.initKeysVector(): concurrent calls on shared parent
  bundles caused ArrayList corruption (ArrayIndexOutOfBoundsException).
  Fixed with volatile field + local variable + early return.
  Reported via eclipse-birt/birt#2394
- UScriptRun.parenStack: static array shared across all instances, indexed
  by instance fields. Concurrent instances would corrupt each other. Fixed
  by making it an instance field.
- CharsetISCII PNJ sets: lazy init of static UnicodeSet fields without
  synchronization. Fixed with holder pattern and frozen sets.
- UCharacter.UnicodeBlock.mref: non-volatile SoftReference. Added volatile.
- BasicPeriodFormatterService.instance: non-volatile lazy singleton.
  Converted to holder pattern.
- LocaleData.gCLDRVersion: non-volatile lazy VersionInfo. Added volatile.
- UnicodeSet.XSYMBOL_TABLE: non-volatile global config field. Added volatile.
@jira-pull-request-webhook
Copy link
Copy Markdown

Notice: the branch changed across the force-push!

  • icu4j/main/core/src/test/java/com/ibm/icu/dev/test/util/CorrectnessFixConcurrencyTest.java is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@jabagawee
Copy link
Copy Markdown
Contributor Author

sorry one last round of force pushing to update some commit messages and javadocs

@markusicu
Copy link
Copy Markdown
Member

  • renamed holder pattern fields from FooHolder.INSTANCE to FooHolder.FOO

hm... i actually like the FooHolder.INSTANCE pattern ...

@markusicu
Copy link
Copy Markdown
Member

  • renamed holder pattern fields from FooHolder.INSTANCE to FooHolder.FOO

hm... i actually like the FooHolder.INSTANCE pattern ...

correction: i like the pattern Foo.INSTANCE without the Holder. but this is just my personal preference.

@markusicu markusicu merged commit 1659052 into unicode-org:main Mar 21, 2026
17 checks passed
@jabagawee
Copy link
Copy Markdown
Contributor Author

to be honest i do too, but i was trying to follow Effective Java as closely as possible. like how sometimes one follows the style guide for consistency’s sake even against their personal preferences

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants