ICU-23321 Eliminate lock contention and fix concurrency bugs across ICU4J#3902
ICU-23321 Eliminate lock contention and fix concurrency bugs across ICU4J#3902markusicu merged 16 commits intounicode-org:mainfrom
Conversation
icu4j/main/core/src/main/java/com/ibm/icu/lang/CharacterProperties.java
Outdated
Show resolved
Hide resolved
icu4j/main/core/src/main/java/com/ibm/icu/lang/CharacterProperties.java
Outdated
Show resolved
Hide resolved
richgillam
left a comment
There was a problem hiding this comment.
I don't think you should check in on my say-so alone; my Java is really rusty at this point, and I only skimmed parts of the review. But this all looks like good stuff, carefully thought out and very well documented.
markusicu
left a comment
There was a problem hiding this comment.
- this is great work, thank you!
- also great documentation, including in the commit messages
- thus, i wonder whether we should keep (most of) these commits separately (rebase & merge multiple commits)? (if so, then we can add a secret incantation to satisfy the CI check)
- optional: except maybe squashing some fix-ups of earlier commits into the respective ones?
- i like the separate, late spotless commit
- i did not try to read & understand absolutely every line...
- i do have a couple of questions/suggestions
There was a problem hiding this comment.
This is the same as what you added for PR #3900, right? Meaning, it is now in conflict? Meaning, once these changes are approved, you will need to rebase & resolve the conflict...?
There was a problem hiding this comment.
yup i'll rebase and resolve after you give final approval
There was a problem hiding this comment.
Thanks for your follow-up changes. I think this is all good. Please go ahead and rebase/absorb/...
icu4j/main/collate/src/main/java/com/ibm/icu/impl/text/RbnfScannerProviderImpl.java
Outdated
Show resolved
Hide resolved
icu4j/main/core/src/main/java/com/ibm/icu/text/MeasureFormat.java
Outdated
Show resolved
Hide resolved
yeah what i can do is to update the commits that are actually doing things in order to "absorb" the spotless changes and extra fixes at the end (idk if that's the standard name for it, but i'm pulling from the idea of
if that's the case, should i not absorb the spotless commit? |
Absorbing fixes on earlier-commit changes would be very nice. It's probably cleanest to also have each commit include its own spotless formatting (GitHub reviews allow ignoring whitespace) but I don't feel strongly either way. As for "absorbing" / hg, the tool I have used for this sort of stuff is |
|
just to confirm, are you cool with me rebasing on the latest main and also editing commits and force pushing now? or should i wait for other people to review this first? |
I propose that you do so now. We have had three people look this over, with few comments. Let's move it along :-) Both the conflict-resolving rebase and the other commit twiddles will confuse GitHub comment tracking. Seems ok to rip off the band-aid once. |
… the frozen original cloneAsThawed() set `frozen = false` on `this` instead of on the clone. When a frozen instance was shared across threads (the common case: getFrozenInstance() returns a cached singleton), any call to cloneAsThawed() would silently unfreeze the shared instance, allowing other threads to mutate it concurrently. This dates back to the very first checkin of DateTimePatternGenerator (ICU-4374, 2006-08-17, svn2github/icu4j@8ffe84bc5). Compare with UnicodeSet.cloneAsThawed(), written two days later (ICU-4217, 2006-08-19), which correctly writes `result.frozen = false`. The DTPG version likely carried over the pattern without the `result.` qualifier. Fix: change `frozen = false` to `result.frozen = false` so the intent is explicit. clone() already sets result.frozen = false (line 1780), so this is technically redundant but makes the contract ("cloneAsThawed returns a thawed clone without modifying the original") self-documenting at the call site. Update TestFreezeAndCloneAsThawed to assert the correct behavior: the original must remain frozen after cloneAsThawed(). The existing test was asserting the buggy behavior. Reference: Effective Java, 3rd Edition, Item 17 "Minimize mutability"
…ormatImpl() formatImpl() has two branches: the DecimalFormat branch (line 774) is thread-safe because toNumberFormatter() returns an immutable LocalizedNumberFormatter. But the else-branch calls numberFormat.format() directly without synchronization. NumberFormat is not thread-safe: concurrent format() calls can corrupt internal state (DigitList, FieldPosition). Fix: wrap the else-branch's numberFormat.format() call in synchronized(numberFormat). Reference: Java Concurrency in Practice, Section 4.3.5 "Documenting synchronization policies"
7e39444 to
76d209f
Compare
|
Notice: the branch changed across the force-push!
~ Your Friendly Jira-GitHub PR Checker Bot |
|
oh no there are tons of changes due to the rebase on main. i was hoping to show that the only changes i made in my rebase were:
but other than that yup this is my finalized branch that i'd love to submit as separate commits |
getLanguageToSet() used volatile without a synchronized block. Two threads could both see null, both construct the map, and a reader could see the reference before the HashMap's internal state was fully visible. Fix: initialization-on-demand holder pattern (JLS 12.4.2). The data is loaded from JAR-bundled ICU resources that cannot fail transiently, so permanent class initialization failure is acceptable. Reference: Effective Java, 3rd Edition, Item 83 "Use lazy initialization judiciously"
…threads Two issues in the collation maxExpansions lazy initialization: 1. CollationTailoring.maxExpansions is not volatile. It is written inside synchronized(tailoring) in RuleBasedCollator.initMaxExpansions() but read without synchronization by getMaxExpansion(). Without volatile, reader threads may never see the updated value. 2. RuleBasedCollator.initMaxExpansions() always enters the synchronized block even when maxExpansions is already initialized. Add a volatile read as a fast-path check before acquiring the lock. Fix: make maxExpansions volatile and add a null-check before the synchronized block (double-checked locking). Reference: Java Concurrency in Practice, Section 3.1.3 "Locking and visibility"
Five related safe-publication fixes in the transliterator subsystem: 1. CaseFoldTransliterator: sourceTargetUtility is lazily initialized without volatile; readers can see a partially-constructed object. Also corrects the lock class: was synchronized on UppercaseTransliterator.class instead of CaseFoldTransliterator.class, carried over from ticket:8227 (svn2github/icu4j@eb34af63f, 2010) when addSourceTargetSet() was added to all case transliterators using UppercaseTransliterator as the template. Fix: add volatile, use correct lock class, add double-checked locking. 2. Lowercase/Uppercase/TitlecaseTransliterator: sourceTargetUtility fields are lazily initialized without volatile. The existing synchronized block in addSourceTargetSet() provides mutual exclusion but without volatile, the fast-path read outside the lock can see a stale null or a partially-constructed object. Fix: add volatile to each field. 3. TransliteratorParser: ILLEGAL_TOP, ILLEGAL_SEG, ILLEGAL_FUNC are static UnicodeSet fields that are neither final nor frozen. Mutable static fields shared across threads without synchronization. Fix: make final and freeze() each UnicodeSet for immutability. Reference: Java Concurrency in Practice, Section 3.5 "Safe publication"
SimpleCache.type and SimpleCache.capacity are assigned once in the constructor and never modified. Make them final to signal immutability and enable the JMM's final field freeze guarantee for safe publication. Also chain constructors to eliminate redundant default-value logic. Reference: Effective Java, 3rd Edition, Item 17 "Minimize mutability"
These tests exercise concurrent access to SimpleCache (via NumberFormat.getInstance and getCurrencyInstance). They should have been included in the batch 1 PR (unicode-org#3863) which replaced Collections.synchronizedMap with ConcurrentHashMap, but were inadvertently omitted.
populateCache() is synchronized on MeasureUnit.class. Even after the cache is populated, every call to getAvailableTypes(), getAvailable(), findBySubType(), and getUnit() acquires this monitor just to check a boolean flag. At scale with many threads, this single monitor becomes a bottleneck. Fix: replace HashMap with ConcurrentHashMap (both outer and inner maps), make cacheIsPopulated volatile, use double-checked locking. After initialization, all reads are lock-free. The outer map must also be ConcurrentHashMap because addUnit() is called at runtime via Currency.getInstance() -> internalGetInstance(). References: - Effective Java, 3rd Edition, Item 83 "Use lazy initialization judiciously" - Java Concurrency in Practice, Section 16.2.4 "Double-checked locking"
getAllTenderCurrencies() and getAllCurrenciesAsSet() are private static synchronized. Every call acquires Currency.class just to check if a SoftReference is still live. Called from isAvailable() and getAvailableCurrencies(), these become bottlenecks when many threads validate currencies concurrently. Fix: use initialization-on-demand holder pattern for tender currencies and all-currencies sets. The JVM guarantees thread-safe lazy initialization of static inner class fields (JLS 12.4.2), eliminating the need for any synchronization. Reference: Effective Java, 3rd Edition, Item 83 "Use lazy initialization judiciously"
getDefault(Category) acquires synchronized(ULocale.class) on every call. This method is called from virtually every formatter factory: NumberFormat.getInstance(), DateFormat.getDateInstance(), MessageFormat constructor, and 20+ more. At scale, every thread creating any formatter serializes on this one lock just to read an array element. Fix: use volatile arrays with copy-on-write semantics. The common path (JDK default locale unchanged) is a volatile read + equals() check with no lock acquisition. The rare path (locale changed) takes the lock and does copy-on-write array replacement. References: - Effective Java, 3rd Edition, Item 83 "Use lazy initialization judiciously" - Java Concurrency in Practice, Section 3.4.2 "Volatile variables"
…attern AliasReplacer.loadAliasData() is private static synchronized. After first call, every subsequent call acquires AliasReplacer.class just to check aliasDataIsLoaded == true. This is on the canonicalize() hot path. Fix: initialization-on-demand holder pattern. Move all alias data into a static inner class AliasData whose fields are populated in the class initializer. The JVM guarantees thread-safe lazy initialization of static inner class fields (JLS 12.4.2), eliminating the need for any synchronization. Reference: Effective Java, 3rd Edition, Item 83 "Use lazy initialization judiciously"
…ject Two issues: 1. TZ_IMPL is a private static int (not volatile). Written by setDefaultTimeZoneType() inside synchronized(TimeZone.class), but read by getDefaultTimeZoneType() without synchronization. Without volatile, reader threads may never see the updated value: a thread that set the timezone type to JDK could have other threads continue using ICU type indefinitely. Fix: make TZ_IMPL volatile. 2. defaultZone synchronization uses TimeZone.class as the monitor. This is a public class: any external code can synchronize on TimeZone.class and cause unexpected contention or deadlock. Fix: introduce a private static final lock object. References: - Java Concurrency in Practice, Section 3.1.3 "Locking and visibility" - Effective Java, 3rd Edition, Item 82 "Document thread safety"
Replace Collections.synchronizedMap and synchronized-HashMap patterns with ConcurrentHashMap for lock-free reads across 7 files: - NormalizationTransliterator: cache (synchronizedMap -> CHM). Uses get-then-putIfAbsent instead of computeIfAbsent because SourceTargetUtility construction scans all Unicode code points and should not hold a CHM bin lock during that work. - TransliteratorIDParser: SPECIALS (synchronizedMap -> CHM) - Transliterator: displayNameCache (synchronizedMap -> CHM) - RbnfScannerProviderImpl: cache (HashMap+synchronized -> CHM). Uses get-then-putIfAbsent because createScanner() constructs a RuleBasedCollator (expensive) and should not hold a bin lock. - PluralRulesLoader: LOCALE_RULES_MAP (HashMap+synchronized -> CHM) - LocaleObjectCache: _map (HashMap+synchronized -> CHM) - MeasureFormat: replaces 3-entry MRU cache (formatter1/2/3) with ConcurrentHashMap for lock-free reads. computeIfAbsent vs get-then-putIfAbsent: computeIfAbsent holds a CHM bin lock for the duration of the lambda. This is safe when the factory is cheap and does not re-enter the same map. When the factory is expensive or could re-enter, get-then-putIfAbsent avoids holding the lock at the cost of occasional duplicate computation. References: - Java Concurrency in Practice, Section 5.2 "ConcurrentHashMap" - Effective Java, 3rd Edition, Item 81 "Prefer concurrency utilities to wait and notify"
Replace synchronized methods that guard one-time lazy initialization with volatile boolean + double-checked locking. After first init, the fast path is a single volatile read: - ZoneMeta: volatile DCL for zoneIDs, CANONICAL_ID_CACHE, and other SoftReference-wrapped caches - Region: volatile DCL for loadRegionData() - SimpleDateFormat: replace two separate volatile fields (locale, pattern) with an immutable snapshot object published via a single volatile write. This eliminates a TOCTOU race where the two fields could be read inconsistently by concurrent threads. - CharsetProviderICU: holder pattern for loadAvailableICUCharsets(). Data is loaded from JAR-bundled ICU resources, so permanent failure on missing data is acceptable. - UConverterAlias: volatile DCL for haveAliasData() - TimeZoneFormat: volatile DCL for timezone name cache - LocaleDisplayNamesImpl: volatile DCL for locale data cache References: - Effective Java, 3rd Edition, Item 83 "Use lazy initialization judiciously" - Java Concurrency in Practice, Section 16.2.4 "Double-checked locking"
Replace synchronized array access with AtomicReferenceArray for lock-free reads of lazily-initialized indexed data: - CharacterProperties: sets/maps arrays -> AtomicReferenceArray (renamed to SETS/MAPS per static final convention). Each UProperty index is independently initialized via compareAndSet, eliminating the global lock on every property lookup. - StringPrep: CACHE array -> AtomicReferenceArray. Provides volatile-read semantics on the fast path (outside the synchronized block) for the double-checked locking pattern. The synchronized block is still used for initialization to avoid duplicate expensive resource loading. References: - Java Concurrency in Practice, Section 15.3 "Atomic variable classes" - Effective Java, 3rd Edition, Item 83 "Use lazy initialization judiciously"
…udit - ResourceBundleWrapper.initKeysVector(): concurrent calls on shared parent bundles caused ArrayList corruption (ArrayIndexOutOfBoundsException). Fixed with volatile field + local variable + early return. Reported via eclipse-birt/birt#2394 - UScriptRun.parenStack: static array shared across all instances, indexed by instance fields. Concurrent instances would corrupt each other. Fixed by making it an instance field. - CharsetISCII PNJ sets: lazy init of static UnicodeSet fields without synchronization. Fixed with holder pattern and frozen sets. - UCharacter.UnicodeBlock.mref: non-volatile SoftReference. Added volatile. - BasicPeriodFormatterService.instance: non-volatile lazy singleton. Converted to holder pattern. - LocaleData.gCLDRVersion: non-volatile lazy VersionInfo. Added volatile. - UnicodeSet.XSYMBOL_TABLE: non-volatile global config field. Added volatile.
76d209f to
0f6437b
Compare
|
Notice: the branch changed across the force-push!
~ Your Friendly Jira-GitHub PR Checker Bot |
|
sorry one last round of force pushing to update some commit messages and javadocs |
hm... i actually like the FooHolder.INSTANCE pattern ... |
correction: i like the pattern Foo.INSTANCE without the Holder. but this is just my personal preference. |
|
to be honest i do too, but i was trying to follow Effective Java as closely as possible. like how sometimes one follows the style guide for consistency’s sake even against their personal preferences |
Continuation of the ICU-23321 concurrency modernization effort (batch 1: #3863, ResourceCache: #3900). This PR addresses thread-safety issues across ~30 files in core, collate, translit, and charset modules.
Scope
16 commits organized by pattern:
Correctness fixes (bugs in existing code):
DateTimePatternGenerator.cloneAsThawed()mutated the frozen original (shared singleton) instead of the clone. Also fixes the test that was asserting the buggy behavior.RelativeDateTimeFormatter.formatImpl()callednumberFormat.format()without synchronization on a non-thread-safe NumberFormat instance.StandardPluralRanges.getLanguageToSet()had a broken double-checked locking pattern (non-volatile field). Replaced with holder pattern.CollationTailoring.maxExpansionswas written without volatile, so concurrent readers could see a stale null.ResourceBundleWrapper.initKeysVector()could be called concurrently on shared parent bundles, causingArrayIndexOutOfBoundsExceptionfrom concurrentArrayList.add(). Reported via BIRT 4.22 Export PDF fails in multi threaded environment when running many reports first time eclipse-birt/birt#2394Lock contention reduction:
Currency: replacedsynchronizedclass-level monitor +SoftReferencewith holder pattern andList.copyOf/Set.copyOf.MeasureUnit: replaced globalsynchronized(MeasureUnit.class)withConcurrentHashMap+ per-type locking.ULocale.getDefault(Category): eliminatedsynchronized(ULocale.class)on every call via volatile snapshot (DefaultCategoryState).ULocale.loadAliasData(): replacedsynchronizedwith holder pattern (JLS 12.4.2).TimeZone: replaced double-lock (java.util.TimeZone.class+TimeZone.class) with a private lock object. MadeTZ_IMPLvolatile.Synchronized caches to ConcurrentHashMap (7 files):
RbnfScannerProviderImpl,PluralRulesLoader,LocaleObjectCache,MeasureFormat,NormalizationTransliterator,ZoneMeta(partial)synchronizedblock to lock-free reads via CHM'sget/putIfAbsentorcomputeIfAbsent.Volatile DCL and holder pattern for one-time init (7 files):
Region,UConverterAlias,SimpleDateFormat,TimeZoneFormat,ZoneMeta,LocaleDisplayNamesImpl: volatile flag + DCLCharsetProviderICU: holder pattern (JAR-bundled resources, permanent failure acceptable)static synchronizedmethods, eliminating contention after first init.AtomicReferenceArray for indexed lazy init (2 files):
CharacterProperties,StringPrep: replacedsynchronizedon arrays withAtomicReferenceArrayfor lock-free reads.Missing-synchronization audit fixes:
UScriptRun.parenStack: static array shared across all instances; made it an instance field.CharsetISCIIPNJ sets: unsynchronized lazy init; converted to holder pattern with frozen sets.UCharacter.UnicodeBlock.mref,BasicPeriodFormatterService.instance,LocaleData.gCLDRVersion,UnicodeSet.XSYMBOL_TABLE: added volatile or converted to holder pattern for JMM correctness.Other:
Transliterators(5 files): made sharedSourceTargetUtilityfields volatile for safe publication.SimpleCache: madetypeandcapacityfields final; added concurrency tests.PluralRulesLoader: fixed sentinel value collision (was usingPluralRules.DEFAULT, whichnewInternal()can return by reference).Testing
CorrectnessFixConcurrencyTest: DTPG cloneAsThawed, RDTF shared-instance formatting, StandardPluralRanges, UScriptRunCollationConcurrencyTest: exercisesinitMaxExpansions()viagetCollationElementIteratorTransliteratorConcurrencyTest: concurrentaddSourceTargetSetcallsLazyInitConcurrencyTest: PluralRules, MeasureFormat, Region, SimpleDateFormat, LocaleDisplayNames, CharacterProperties, StringPrep, ResourceBundleWrapperCacheConcurrencyTest: concurrent SimpleCache operationsULocaleDefaultConcurrencyTest: concurrentgetDefault(Category)withsetDefaultTimeZoneConcurrencyTest: concurrentgetDefault/setDefaultChecklist
ALLOW_MANY_COMMITS=true