fix: prevent unbounded memory leak during concurrent validation under… by KyleOps · Pull Request #239 · hapifhir/org.hl7.fhir.validator-wrapper

KyleOps · 2026-02-26T15:01:49Z

This PR fixes a memory and thread exhaustion vulnerability in ValidationServiceFactoryImpl.kt.

Under high concurrency, if the JVM's free memory dips below the engineReloadThreshold, the getValidationService() method re-initializes ValidationService. Because this check was not synchronized, a burst of incoming validation requests during a low-memory state would all evaluate the condition as true simultaneously.

This resulted in an unbounded loop where the application spawned dozens of massive ValidationEngine instances and hundreds of detached background downloader threads to reload presets. The JVM quickly spiraled into maxed CPU usage, gigabytes of leaked memory, and eventual crashes.

Impact & Context

We discovered this in while utilizing the validator API in conjunction with au-fhir-inferno. Under heavy testing load, the validator-wrapper would hit the memory threshold and trigger this un-synchronized cache reset. The true impact was three-fold:

Severe API Latency: The heavy I/O of re-building the cache blocks threads for several seconds. When 5 requests initialize 5 engines simultaneously, the CPU maxes out and all incoming requests hang, causing massive latency spikes (5-20s+) and upstream application timeouts.
Caching "Thrashing": Under load, if the threshold is reached frequently, the server spins up a death spiral of actively deleting and downloading large FHIR packages, completely negating the benefit of an in-memory cache.
Thread, File & Network Exhaustion: Even if the environment's memory can survive the spike, the background threads consume vast network bandwidth pulling .tgz files from packages.fhir.org and leak file descriptors on the host container.

Changes

Applied @Volatile to validationService.
Added a ReentrantLock (reloadLock) to ensure only one thread can lock and initialize the cache.
Added a 5-minute cooldown (RELOAD_COOLDOWN_MS) so the engine isn't perpetually thrashed under sustained heavy load.
Concurrent threads that hit the low memory state while the lock is acquired will now safely log and skip initialization, returning the existing engine and preventing the death spiral.

… load Introduced a ReentrantLock and rate-limiting to `getValidationService()` in ValidationServiceFactoryImpl.kt. When the JVM hits the low memory threshold during high concurrency, the application previously allowed all concurrent requests to spin up parallel initialization threads, spiraling memory usage and crashing the application. This fix ensures only a single thread can trigger the engine reload within a 5-minute cooldown period, allowing concurrent requests to queue or utilize the existing engine rather than flooding the JVM with unbounded heavy cache initialization threads.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: prevent unbounded memory leak during concurrent validation under…#239

fix: prevent unbounded memory leak during concurrent validation under…#239
KyleOps wants to merge 1 commit intohapifhir:masterfrom
KyleOps:fix/validation-service-memory-leak

KyleOps commented Feb 26, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

KyleOps commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Impact & Context

Changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

KyleOps commented Feb 26, 2026 •

edited

Loading