ICU-23321 ResourceCache: replace synchronized trie with ConcurrentHashMap#3900
Conversation
|
That guy who wrote b3a6ea1 thought he made some clever changes... :-/ |
|
Yeah I anticipated that feedback, which is why I split it out into its own PR. I'm willing to keep the original implementation? |
|
I can still make some improvements to the lock contention if we stick to the trie data structure, so I'd just revive another branch of mine that tinkered with the existing code without changing the implementation wholesale. |
Sorry for my confusing comment. You clearly understand Java multi-threaded performance better than me (that guy...), and have done much more thorough testing. I will review this PR, and I have asked the team for a second pair of eyes. |
markusicu
left a comment
There was a problem hiding this comment.
This looks very good to me!
PR description:
putIfAbsent() uses CHM's atomic putIfAbsent()
... but it's using CHM.compute(). Please fix.
Anyone else want to chime in? @mihnita @gvictor @vladcioricagoogle ?
icu4j/main/core/src/test/java/com/ibm/icu/dev/test/util/ResourceCacheConcurrencyTest.java
Outdated
Show resolved
Hide resolved
icu4j/main/core/src/main/java/com/ibm/icu/impl/ICUResourceBundleReader.java
Outdated
Show resolved
Hide resolved
icu4j/main/core/src/main/java/com/ibm/icu/impl/ICUResourceBundleReader.java
Show resolved
Hide resolved
icu4j/main/core/src/main/java/com/ibm/icu/impl/ICUResourceBundleReader.java
Outdated
Show resolved
Hide resolved
icu4j/main/core/src/main/java/com/ibm/icu/impl/ICUResourceBundleReader.java
Outdated
Show resolved
Hide resolved
|
lgtm, tnx! let's wait a day or so to give others a chance to chime in. |
icu4j/main/core/src/main/java/com/ibm/icu/impl/ICUResourceBundleReader.java
Show resolved
Hide resolved
markusicu
left a comment
There was a problem hiding this comment.
Let's do it.
@jabagawee do you want to do the squashing honors?
…hMap Replace the custom synchronized trie in ICUResourceBundleReader.ResourceCache with ConcurrentHashMap, eliminating a heavily contended lock on the resource loading path. ResourceCache.get() was synchronized and called on every resource bundle access (getString, getAlias, getArray, getTable). The custom trie (ICU-10932, 2014) optimized memory via int[]/Object[] arrays to avoid Integer autoboxing, but its sparse power-of-2 allocations waste 94.8% of slots (39KB vs 13KB per cache with ConcurrentHashMap). Benchmark (NumberFormat.getInstance full stack): 1 thread: 543 -> 505 ops/ms (slight regression from autoboxing) 32 threads: 2,074 -> 4,051 ops/ms (2x improvement from lock elimination) Uses ConcurrentHashMap.compute() in the put path to atomically handle cleared SoftReferences (plain putIfAbsent cannot replace a non-null but dead entry).
cdd139a to
cf94873
Compare
|
Hooray! The files in the branch are the same across the force-push. 😃 ~ Your Friendly Jira-GitHub PR Checker Bot |
|
done! |
Replace the custom synchronized trie in
ICUResourceBundleReader.ResourceCachewithConcurrentHashMap, eliminating a heavily contended lock on the resource loading path. This is split out from the batch 2 PR (forthcoming) because it replaces a data structure rather than just changing a synchronization idiom, and warrants separate review.Background
ResourceCache.get()issynchronizedand is called on every resource bundle access —getString(),getStringV2(),getAlias(),getArray(),getTable(). On a multi-threaded server, this monitor serializes all resource lookups across all threads.The custom trie was introduced in ICU 54.1 (June 2014, ICU-10932, b3a6ea1) as a memory optimization for Android. It uses
int[]/Object[]arrays to avoidIntegerautoboxing andHashMap.Nodeper-entry overhead. The JIRA discussion focuses on memory optimization; the choice ofsynchronizedvs alternatives is not discussed.Why the trie loses on memory
The trie's per-entry savings (no autoboxing, no Node objects) are real but small. What kills it is sparse power-of-2 array allocations. Measured via reflection across 44 loaded caches (avg 221 entries each):
The trie allocates a 128-slot root and ~65 child Levels of 64 slots each, but only 221 slots contain data. 94.8% of allocated slots are empty.
Benchmark results
Full-stack throughput (
NumberFormat.getInstance(), which exercisesULocale.getDefault→SimpleCache→ResourceCache):synchronizedtrieReadWriteLock was also benchmarked as a middle-ground alternative. It underperforms both —
readLock()still does a CAS on a shared state word, making read-read contention on that counter the new bottleneck.Benchmark code: https://gist.github.com/jabagawee/54710d21ca98c1e6a6d307b5229bbd21
Known tradeoff: autoboxing
ConcurrentHashMap<Integer, Object>autoboxesintkeys on everyget(). Resource offsets are typically outside theIntegercache range (-128..127), so each lookup allocates a short-livedInteger. Accepted because:IntegerallocationsIntObjectHashMap) would require a new dependencyChanges
ResourceCacheinternals withConcurrentHashMap<Integer, Object>get()becomes lock-freeputIfAbsent()uses CHM'scompute()to atomically handle cleared SoftReferences (plainputIfAbsent()cannot replace a non-null but dead SoftReference entry)synchronizedfrom both methodsLevel/keys/valuestrie structure (~170 lines deleted)ResourceCacheConcurrencyTest— concurrent resource bundle lookups from 16 threads, gated on-DICU.exhaustive=5References
Checklist