OPENNLP-1816: Make ME classes thread-safe by eliminating shared mutable instance state by krickert · Pull Request #1003 · apache/opennlp

krickert · 2026-03-30T14:32:33Z

Summary

Make ME classes (TokenizerME, SentenceDetectorME, POSTaggerME, LemmatizerME) safe for concurrent use by eliminating shared mutable instance state. This enables reusing ME instances across threads instead of allocating a new instance per call, reducing allocation overhead in high-throughput pipelines.

The old pattern (new TokenizerME(model) per call) continues to work identically — zero regressions in correctness or performance.

Motivation

ME classes were documented as not thread-safe due to mutable instance fields (bestSequence, tokProbs, newTokens, sentProbs) that corrupt under concurrent access. The recommended workaround was either creating a new ME instance per call (expensive for high-throughput pipelines processing thousands of sentences in parallel) or using the ThreadSafe*ME wrappers (which use ThreadLocal and leak in Jakarta EE / long-running thread environments).

The root cause was mutable state at four layers:

ME classes — result fields written on every call
Context generators — per-call caches (contextsCache, wordsKey, buf, collectFeats)
Feature generators — CachedFeatureGenerator with mutable prevTokens and cache
BeamSearch — shared probs[] output buffer and a contextsCache that stored references to the reused buffer (cached values were always stale)

Approach

Move mutable state to method-local variables at every layer. ME instance fields are preserved as volatile for backward-compatible probs() access (last-writer-wins under concurrency). Caches are removed entirely — they were small (size 3 typically), not thread-safe, and in BeamSearch's case, buggy.

Files changed (10 source, 5 test)

File	Change
`BeamSearch.java`	Removed shared `probs[]` and buggy `contextsCache`; added `@ThreadSafe`
`DefaultSDContextGenerator.java`	`buf`/`collectFeats` moved to method-local; `collectFeatures()` signature updated
`SentenceContextGenerator.java` (Thai)	Updated to match new `collectFeatures()` signature
`DefaultPOSContextGenerator.java`	Removed `contextsCache` and `wordsKey`
`ConfigurablePOSContextGenerator.java`	Removed `contextsCache` and `wordsKey`
`CachedFeatureGenerator.java`	Removed `prevTokens`, `contextsCache`, counters; delegates directly
`TokenizerME.java`	`newTokens`/`tokProbs` volatile; `tokenizePos()` uses local lists
`SentenceDetectorME.java`	`sentProbs` volatile; `sentPosDetect()` uses local list
`POSTaggerME.java`	`bestSequence` volatile; `tag()` uses local var; added null guard
`LemmatizerME.java`	`bestSequence` volatile; `predictSES()` uses local var

Backward compatibility

The old pattern (new ME(model) per call) is unchanged — verified by regression benchmark
probs() methods preserved (deprecated behavior under concurrency, correct single-threaded)
Constructor signatures preserved (cacheSize params accepted but ignored, marked @Deprecated(since = "3.0.0"))
No new dependencies

Test plan

All 675 existing tests pass (mvn test on opennlp-runtime)
ThreadSafetyBenchmarkTest — JUnit correctness test: shared ME instances produce identical results to single-threaded baseline across all CPU cores
RegressionBenchmark — head-to-head stock vs patched, new-instance-per-call only: zero mismatches, zero errors, performance within noise on both builds
ThreadSafetyBenchmark — three-way comparison (new-instance-per-call / instance-per-thread / shared-single-instance)
CachedFeatureGeneratorTest — updated for removed cache behavior
Checkstyle violation count unchanged (9,446 pre-existing on both stock and patched)
Full mvn clean install at root (checkstyle must be skipped — 9,446 pre-existing violations on main)

Regression benchmark results (32 threads, new-instance-per-call)

Proves zero regression — stock vs patched, same API pattern:

Component	stock avg_ms	patched avg_ms
Tokenizer	16.09	16.69
SentenceDetector	9.21	9.01
POSTagger	105.76	106.58

Speedup benchmark results (32 threads, three-way comparison)

Approaches

The benchmark compares three strategies for using ME classes in a multi-threaded environment. All three produce identical output for a given input — the difference is how ME instances are allocated and shared.

Approach	Description	Example code
new-instance-per-call	A fresh ME instance is created for every single operation. This is the traditional pattern and the baseline. Safe but expensive — each call pays the full cost of constructing the ME, its BeamSearch, context generators, and feature generator chain.	`String[] tags = new POSTaggerME(model).tag(tokens);`
instance-per-thread	One ME instance is created per thread and reused across all operations on that thread. No cross-thread sharing, so no contention. Eliminates per-call constructor overhead while remaining completely safe.	`POSTaggerME tagger = new POSTaggerME(model);` `for (String[] t : sentences) tagger.tag(t);`
shared-single-instance	A single ME instance is shared across all threads. Maximum memory efficiency — only one set of internal structures exists. Works for TokenizerME and SentenceDetectorME. POSTaggerME has known contention in the feature generator chain at high thread counts.	`POSTaggerME shared = new POSTaggerME(model);` `// pass shared to all threads`

Benchmark results

Component	Approach	avg_ms	Speedup
Tokenizer	new-instance-per-call	16.63	1.0x
Tokenizer	instance-per-thread	15.92	1.04x
Tokenizer	shared-single-instance	16.24	1.02x
SentenceDetector	new-instance-per-call	9.49	1.0x
SentenceDetector	instance-per-thread	9.28	1.02x
SentenceDetector	shared-single-instance	8.93	1.06x
POSTagger	new-instance-per-call	133.55	1.0x
POSTagger	instance-per-thread	80.01	1.67x

POSTagger sees the largest gain because its constructor is the heaviest — it builds a BeamSearch, a ConfigurablePOSContextGenerator, and a full AdaptiveFeatureGenerator chain on every instantiation. Reusing one instance per thread eliminates that allocation on every call, yielding a 1.67x speedup with zero correctness impact.

Tokenizer and SentenceDetector constructors are lighter, so the per-call overhead is smaller and all three approaches perform similarly.

See opennlp-core/opennlp-runtime/BENCHMARKS.md for full benchmark instructions.

Thank you for contributing to Apache OpenNLP.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

For all changes:

Is there a JIRA ticket associated with this PR? Is it referenced
in the commit message?
https://issues.apache.org/jira/browse/OPENNLP-1816
Does your PR title start with OPENNLP-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
Has your PR been rebased against the latest commit within the target branch (typically main)?
Is your initial contribution a single, squashed commit?

For code changes:

Have you ensured that the full suite of tests is executed via mvn clean install at the root opennlp folder?
Have you written or updated unit tests to verify your changes?
If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
- N/A — no new dependencies added
If applicable, have you updated the LICENSE file, including the main LICENSE file in opennlp folder?
- N/A — no license changes required
If applicable, have you updated the NOTICE file, including the main NOTICE file found in opennlp folder?
- N/A — no notice changes required

For documentation related changes:

Have you ensured that format looks appropriate for the output in which it is rendered?

Note:

Please ensure that once the PR is submitted, you check GitHub Actions for build issues and submit an update to your PR as soon as possible.

krickert · 2026-03-30T16:56:47Z

There were 3 checkstyle violations - fixed those.

jzonthemtn · 2026-03-30T17:11:41Z

@krickert Thanks for the PR!

krickert · 2026-03-30T17:29:12Z

always been a fan of OpenNLP - what I love about finally contributing is that before this patch, I was having to create pools of ME objects or create new ones every time. This gets rid of all that scaffolding.

If you would like me to create anymore tests, let me know.

I think the new tests cover the concurrency and recall use cases well. And I think the speed tests show that there's no concern about performance. I was excited to see the > 1.5x speedup with POSTagger... It's the single reason why I decided to work on this.

rzo1 · 2026-03-30T17:34:51Z

Hi,

Thanks for the contribution! Overall, I like the idea of looking into built-in thread safety rather than relying on ThreadLocal-based wrappers, which have known issues in Jakarta EE and other long-lived thread environments.

A few concerns I'd like to discuss before this can move forward (imho):

Benchmarks (no JMH)

The benchmarks are hand-rolled System.nanoTime() + ExecutorService loops. Without JMH, the results are susceptible to JIT warmup, GC pauses, and profile pollution, i.e. there's no fork isolation, no warmup iterations, and no statistical variance reporting. For a change that removes multiple
caching layers and claims "performance within noise," JMH would be preferable. We already have a profile for JMH benchmarks.

Caches removed without replacement

Three layers of caching were removed as a shortcut to thread safety:

CachedFeatureGenerator: the class is now a pass-through that caches nothing, despite its name. During beam search, the same token position may be feature-generated up to k times (beam width) with identical inputs. This cache was saving real work.
DefaultPOSContextGenerator / ConfigurablePOSContextGenerator: per-sentence context caches removed entirely.
`BeamSearch.contextsCache: you note this was buggy (stale references from the shared probs[] buffer). That may be valid, but removing it rather than fixing it (e.g., storing copies) conflates a bug fix with the thread-safety change.

The regression benchmark reports "performance within noise," but without JMH-level statistical rigor that's hard to verify. More importantly, the benchmark uses a small set of short sentences: a benchmark against a real-world dataset (e.g., from the eval/test corpora: https://nightlies.apache.org/opennlp/) would be far more convincing, particularly for POS tagging where the feature generation cache had the most impact under larger workloads. A thread-safe alternative would be making the caches method-local rather than removing them entirely.

Thread-safety tests are not robust

No contention forcing: there's no CyclicBarrier or CountDownLatch to ensure threads hit the critical section simultaneously. Threads free-run, which reduces the probability of surfacing races.
LemmatizerME was patched but has no thread-safety test.
probs() under concurrent access is not tested, despite being preserved as volatile for backward compatibility.
The test could pass on a 2-core CI machine and fail on a 64-core box. I think, that we sohuld at minimum set higher iteration counts with barrier-synchronized thread starts.

Missing coverage

Only 4 of 7 ME classes are addressed (ChunkerME, NameFinderME, LanguageDetectorME are untouched). This is fine as a scoped PR, but worth noting, so the existing ThreadSafe*ME wrappers can't be deprecated yet.

rzo1 · 2026-03-30T17:37:40Z

Regardless of my comment, I am going to trigger a Eval build for this: https://ci-builds.apache.org/job/OpenNLP/job/eval-tests-configurable/39/

krickert · 2026-03-30T18:01:20Z

@rzo1 working on addressing all of your concerns right now - it'll be done in a moment. I'm restoring the caches and running tests with and without with proper benchmarks. All great points, and thanks for the feedback.

krickert · 2026-03-30T18:23:47Z

I'm going to make the caches optional and configurable. This way we can run tests against all scenarios and come up with as many uses cases as needed to measure the impact.

The last commit was premature, I'm still working on this.

mawiesne · 2026-03-30T18:43:41Z

I'm going to make the caches optional and configurable. This way we can run tests against all scenarios and come up with as many uses cases as needed to measure the impact.

@krickert Thx Kristian for tackling this complex topic with so much energy! Much appreciated! Happy to review this PR deeper, especially lkn fwd for the JMH analyses. Richard has already given deep feedback in the first round; I'll share my 2c later on code stylistic nuances, seeking an optimal result from devs perspective.

For the moment, completing the 3.0.0-M2 release process is on my list…

krickert · 2026-03-30T18:55:39Z

@mawiesne no problem... I've been thinking about this for awhile now.

@rzo1 you were right about CachedFeatureGenerator. The data shows it clearly and it helps. That particular cache in the old vs new instances to bring a 1.6x boost. This combined with the thread safety feature with reuse show over a 2x increase now. Thanks for pointing that out. But don't trust what I say; I'll update the tests shortly to show it (I would love to see it on another machine too)

krickert · 2026-03-30T19:47:15Z

Thanks for the detailed feedback. We've addressed all four points made by @rzo1 . Here's a summary of what changed and the JMH data behind each decision.

1. Benchmarks (JMH)

Replaced all hand-rolled System.nanoTime() benchmarks with proper JMH. Three benchmark classes in src/jmh/java:

TokenizerMEBenchmark
SentenceDetectorMEBenchmark
POSTaggerMEBenchmark (with @Param for cache configuration)

Also fixed the existing JMH profile - the annotation processor wasn't wired into the compiler plugin, so the BenchmarkList was never generated. Added maven-compiler-plugin config with annotationProcessorPaths to the jmh profile.

Approaches measured

Approach	Description	Example
`newInstancePerCall`	Fresh ME per operation - the traditional pattern and baseline. Each call pays full constructor cost (BeamSearch, context generators, feature generator chain).	`new POSTaggerME(model).tag(tokens)`
`instancePerThread`	One ME per thread, reused across operations. No cross-thread sharing, no contention. Eliminates per-call constructor overhead.	`POSTaggerME tagger = new POSTaggerME(model);` then reuse
`sharedInstance`	Single ME shared by all threads. Maximum memory efficiency.	Pass one instance to all threads

JMH Results (32 threads, all cores)

Benchmark	Mode	Cnt	Score	Error	Units
TokenizerMEBenchmark.newInstancePerCall	thrpt	5	570469	± 6885	ops/s
TokenizerMEBenchmark.instancePerThread	thrpt	5	576365	± 25758	ops/s
TokenizerMEBenchmark.sharedInstance	thrpt	5	570312	± 12754	ops/s
SentenceDetectorMEBenchmark.newInstancePerCall	thrpt	5	837841	± 7903	ops/s
SentenceDetectorMEBenchmark.instancePerThread	thrpt	5	853319	± 25920	ops/s
SentenceDetectorMEBenchmark.sharedInstance	thrpt	5	849994	± 31635	ops/s
POSTaggerMEBenchmark.newInstancePerCall	thrpt	5	24886	± 2725	ops/s
POSTaggerMEBenchmark.instancePerThread	thrpt	5	62727	± 2410	ops/s
POSTaggerMEBenchmark.sharedInstance	thrpt	5	61666	± 7119	ops/s

Tokenizer and SentenceDetector: all approaches within error bars (lightweight constructors).
POSTagger: 2.52x speedup for instancePerThread vs newInstancePerCall.

2. Caches

We restored all caches as ThreadLocal (per-thread, not shared). Same behavior as the originals in single-threaded use, safe under concurrency.

We also added a contextCacheSize parameter to POSTaggerME and a DISABLE_CACHE_PROPERTY system property to CachedFeatureGenerator so the cache impact can be measured independently via JMH @Param.

JMH Cache Impact Results (POSTagger, 32 threads)

Benchmark	(allCaches)	Mode	Cnt	Score	Error	Units
POSTaggerMEBenchmark.instancePerThread	true	thrpt	5	64349	± 3216	ops/s
POSTaggerMEBenchmark.instancePerThread	false	thrpt	5	39702	± 870	ops/s
POSTaggerMEBenchmark.newInstancePerCall	true	thrpt	5	25394	± 2467	ops/s
POSTaggerMEBenchmark.newInstancePerCall	false	thrpt	5	23954	± 2324	ops/s
POSTaggerMEBenchmark.sharedInstance	true	thrpt	5	64663	± 2735	ops/s
POSTaggerMEBenchmark.sharedInstance	false	thrpt	5	39620	± 1139	ops/s

This told us which caches matter and which don't:

Cache	Restored as	JMH Impact	Notes
`CachedFeatureGenerator`	ThreadLocal	1.62x (64K vs 39K ops/s)	Saves real work - caches outcome-independent features across beam candidates at the same token position
`ConfigurablePOSContextGenerator`	ThreadLocal	None (65K vs 64K, within error)	Cache key includes prior tags, which differ per beam candidate - near-zero hit rate
`BeamSearch.contextsCache`	ThreadLocal	N/A	Every caller in the codebase passes `cacheSize=0`. Never enabled for any ME class. Restored for API backward compatibility

Regarding the BeamSearch cache specifically

you note this was buggy (stale references from the shared probs[] buffer). That may be valid, but removing it rather than fixing it (e.g., storing copies) conflates a bug fix with the thread-safety change.

We restored it as ThreadLocal with per-thread probs[] buffers, which fixes the stale-reference issue. However, we also checked every new BeamSearch(...) call in the codebase - every single one passes cacheSize=0 (either via the 2-arg constructor or explicitly). The cache has never been enabled by any caller in the project's history. We kept the 3-arg constructor for external API compatibility.

3. Thread-safety tests

Addressed all sub-points:

Contention forcing: All tests now use CyclicBarrier - threads wait at the barrier before starting, ensuring they hit the critical section simultaneously.
LemmatizerME: Added sharedLemmatizerProducesCorrectResults() test.
Thread/iteration counts: Math.max(8, availableProcessors()) threads, 200 reps per thread.
probs(): Added probsDoesNotThrowUnderConcurrency() test - verifies probs() returns valid data (non-null, non-empty) under concurrent tag() calls without throwing. The returned values are last-writer-wins by design (documented in volatile field comments) - the core processing methods are what we guarantee correct under concurrency.

4. Missing ME classes

All 7 ME classes are now covered:

Class	Source change	Thread-safety test
TokenizerME	`volatile` + method-local	`sharedTokenizerProducesCorrectResults()`
SentenceDetectorME	`volatile` + method-local	`sharedSentenceDetectorProducesCorrectResults()`
POSTaggerME	`volatile` + method-local + null guard	`sharedPOSTaggerProducesCorrectResults()`
LemmatizerME	`volatile` + method-local	`sharedLemmatizerProducesCorrectResults()`
ChunkerME	`volatile` + method-local + null guard	`sharedChunkerProducesCorrectResults()`
NameFinderME	`volatile` + method-local + null guard	`sharedNameFinderProducesCorrectResults()`
LanguageDetectorME	Already thread-safe (stateless)	`sharedLangDetectorProducesCorrectResults()`

All 7 ME classes are annotated @ThreadSafe.

5. ThreadSafe*ME wrappers deprecated

Since the ME classes are now themselves thread-safe, the ThreadSafe*ME wrappers are redundant. We deprecated all 7:

ThreadSafeTokenizerME → use TokenizerME directly
ThreadSafeSentenceDetectorME → use SentenceDetectorME directly
ThreadSafePOSTaggerME → use POSTaggerME directly
ThreadSafeLemmatizerME → use LemmatizerME directly
ThreadSafeChunkerME → use ChunkerME directly
ThreadSafeNameFinderME → use NameFinderME directly
ThreadSafeLanguageDetectorME → use LanguageDetectorME directly

We also replaced all internal usages of ThreadSafe*ME with direct ME usage:

Muc6NameSampleStreamFactory: ThreadSafeTokenizerME → TokenizerME
TwentyNewsgroupSampleStreamFactory: ThreadSafeTokenizerME → TokenizerME
POSTaggerMEIT: ThreadSafeTokenizerME / ThreadSafePOSTaggerME → TokenizerME / POSTaggerME

No internal code uses the wrappers anymore.

Open item

a benchmark against a real-world dataset (e.g., from the eval/test corpora) would be far more convincing

Agreed - this would strengthen the perf claims. The JMH benchmarks currently use the project's test data (AnnotatedSentences.txt). We're happy to add an eval-corpus benchmark as a follow-up, or include it in this PR if you'd prefer.

Do you have any real-world dataset tests around that we can run it against quickly? It's the only way I'd feel confident as well.

krickert · 2026-03-30T20:03:56Z

Summary since first review:

Made all 7 ME classes thread-safe by eliminating shared mutable instance state. Deprecate the ThreadSafe*ME wrappers - users can now share ME instances directly.

Motivation

ME classes were documented as not thread-safe due to mutable instance fields that corrupt under concurrent access. The workarounds were creating a new ME instance per call (expensive) or using ThreadSafe*ME wrappers (ThreadLocal-based, leak-prone in Jakarta EE). This PR makes the ME classes themselves thread-safe, yielding a 2.52x throughput improvement for POSTagger (JMH, 32 threads) by enabling instance reuse.

Approach

Mutable state moved to method-local variables or per-thread caches (ThreadLocal) at every layer:

Layer	Change
ME classes (all 7)	Result fields (`bestSequence`, `tokProbs`, etc.) made `volatile`; processing uses method-local variables with atomic swap at end
BeamSearch	`probs[]` buffer and `contextsCache` moved to per-thread `ThreadLocal` state
CachedFeatureGenerator	Cache moved to per-thread `ThreadLocal` (JMH confirms 1.62x benefit)
ConfigurablePOSContextGenerator	Cache moved to per-thread `ThreadLocal`
DefaultSDContextGenerator	`buf`/`collectFeats` moved to method-local parameters

Files changed (30 total)

Source (13 files): TokenizerME, SentenceDetectorME, POSTaggerME, LemmatizerME, ChunkerME, NameFinderME, LanguageDetectorME, BeamSearch, CachedFeatureGenerator, ConfigurablePOSContextGenerator, DefaultPOSContextGenerator, DefaultSDContextGenerator, SentenceContextGenerator (Thai)

Deprecated (7 files): ThreadSafeTokenizerME, ThreadSafeSentenceDetectorME, ThreadSafePOSTaggerME, ThreadSafeLemmatizerME, ThreadSafeChunkerME, ThreadSafeNameFinderME, ThreadSafeLanguageDetectorME

Internal usage swaps (3 files): Muc6NameSampleStreamFactory, TwentyNewsgroupSampleStreamFactory, POSTaggerMEIT - replaced ThreadSafe*ME with direct ME usage

Tests/benchmarks (5 files): ThreadSafetyBenchmarkTest (8 JUnit tests), 3 JMH benchmarks, CachedFeatureGeneratorTest update

Build (1 file): pom.xml - fixed JMH annotation processor wiring

krickert · 2026-03-31T11:40:27Z

@mawiesne - I did a push again to make the code try to match the style better - the problem I had was that your CICD failed linting and forced me to do 80-column code - which makes part of the code look ugly if not for my IDE. Can you ease up on the linting to make it 120 or 140 columns? or is that too much? I don't care either way, it's just a setting on my IDE - but the code in there has 3000+ violations - so I don't suspect it's really been enforced for a long time.

rzo1 · 2026-03-31T11:52:16Z

Note: You can use the OpenNLP Formatting XML which is provided as download. In addition, you only have a few fixes:

Error:  Failed to execute goal org.apache.maven.plugins:maven-checkstyle-plugin:3.6.0:check (validate) on project opennlp-runtime: You have 2 Checkstyle violations. -> [Help 1]

krickert · 2026-03-31T14:33:13Z

Note: You can use the OpenNLP Formatting XML which is provided as download. In addition, you only have a few fixes:
Error:  Failed to execute goal org.apache.maven.plugins:maven-checkstyle-plugin:3.6.0:check (validate) on project opennlp-runtime: You have 2 Checkstyle violations. -> [Help 1]

Oh cool! Thanks. I'll fix those today

@threadsafe

…le state All 7 ME classes (TokenizerME, SentenceDetectorME, POSTaggerME, LemmatizerME, ChunkerME, NameFinderME, LanguageDetectorME) are now safe for concurrent use from multiple threads. The ThreadSafe*ME wrappers are deprecated — use the ME classes directly. Thread-safety approach: - ME instance fields (bestSequence, tokProbs, newTokens, sentProbs) changed to volatile with method-local processing, atomic swap at end - BeamSearch: probs[] buffer and contextsCache moved to per-thread state via ThreadLocal - CachedFeatureGenerator: cache moved to per-thread state via ThreadLocal (JMH confirms 1.62x benefit from this cache) - ConfigurablePOSContextGenerator: cache moved to per-thread state via ThreadLocal - DefaultSDContextGenerator: buf/collectFeats moved to method-local JMH benchmark results (32 threads): - POSTagger instancePerThread: 2.52x faster than newInstancePerCall - POSTagger cache on vs off: no measurable difference for context generator cache; CachedFeatureGenerator provides 1.62x benefit - Tokenizer/SentenceDetector: all approaches within error bars API changes: - All 7 ME classes annotated @threadsafe - All 7 ThreadSafe*ME wrappers annotated @deprecated(since="3.0.0") - POSTaggerME: added constructor with contextCacheSize parameter - CachedFeatureGenerator: added DISABLE_CACHE_PROPERTY for benchmarking - Internal usages of ThreadSafe*ME replaced with direct ME usage Tests: - ThreadSafetyBenchmarkTest: 8 JUnit tests with CyclicBarrier (all 7 ME classes + probs() concurrency test) - JMH benchmarks for Tokenizer, SentenceDetector, POSTagger - Fixed JMH annotation processor config in pom.xml - All 680 runtime + 352 formats tests pass

krickert · 2026-03-31T23:41:13Z

Fixed.. let me know if there's more tests you'd like me to do. I think between the benchmarks, passing tests, and harness, it seems like a great use case.

krickert changed the title ~~OPENNLP-1816Make ME classes thread-safe by eliminating shared mutable instance state~~ OPENNLP-1816: Make ME classes thread-safe by eliminating shared mutable instance state Mar 30, 2026

krickert force-pushed the feature/thread-safe-me branch from 2a904dd to 729b9c1 Compare March 30, 2026 16:49

rzo1 requested review from atarora, jzonthemtn, mawiesne and rzo1 March 30, 2026 17:35

krickert force-pushed the feature/thread-safe-me branch from 729b9c1 to 94ca28d Compare March 30, 2026 18:20

krickert force-pushed the feature/thread-safe-me branch 2 times, most recently from bfc8fdf to d31aaa6 Compare March 30, 2026 19:31

krickert force-pushed the feature/thread-safe-me branch from d31aaa6 to b02c2eb Compare March 31, 2026 03:20

mawiesne marked this pull request as draft March 31, 2026 06:50

krickert force-pushed the feature/thread-safe-me branch from b02c2eb to 178386f Compare March 31, 2026 23:37

Conversation

krickert commented Mar 30, 2026

Summary

Motivation

Approach

Files changed (10 source, 5 test)

Backward compatibility

Test plan

Regression benchmark results (32 threads, new-instance-per-call)

Speedup benchmark results (32 threads, three-way comparison)

Approaches

Benchmark results

For all changes:

For code changes:

For documentation related changes:

Note:

Uh oh!

krickert commented Mar 30, 2026

Uh oh!

jzonthemtn commented Mar 30, 2026

Uh oh!

krickert commented Mar 30, 2026

Uh oh!

rzo1 commented Mar 30, 2026

Uh oh!

rzo1 commented Mar 30, 2026

Uh oh!

krickert commented Mar 30, 2026

Uh oh!

krickert commented Mar 30, 2026

Uh oh!

mawiesne commented Mar 30, 2026

Uh oh!

krickert commented Mar 30, 2026

Uh oh!

krickert commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1. Benchmarks (JMH)

Approaches measured

JMH Results (32 threads, all cores)

2. Caches

JMH Cache Impact Results (POSTagger, 32 threads)

Regarding the BeamSearch cache specifically

3. Thread-safety tests

4. Missing ME classes

5. ThreadSafe*ME wrappers deprecated

Open item

Uh oh!

krickert commented Mar 30, 2026

Motivation

Approach

Files changed (30 total)

Uh oh!

krickert commented Mar 31, 2026

Uh oh!

rzo1 commented Mar 31, 2026

Uh oh!

krickert commented Mar 31, 2026

Uh oh!

krickert commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

krickert commented Mar 30, 2026 •

edited

Loading