-
Notifications
You must be signed in to change notification settings - Fork 17
fix: Memory leak fixes in RQ and Local engines #82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Branch: RedisTaskMemoryLeak Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
|
✅ DCO Check Passed Thanks @gabe-l-hart, all your commits are properly signed off. 🎉 |
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🟢 Enforce conventional commitWonderful, this rule succeeded.Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/
|
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Branch: RedisTaskMemoryLeak Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
Branch: RedisTaskMemoryLeak Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
…orkers and clear them Branch: RedisTaskMemoryLeak Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
Branch: RedisTaskMemoryLeak Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
Branch: RedisTaskMemoryLeak Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
|
I found a number of additional issues leading to memory leakage that should now be fixed in this PR and will update the description accordingly. |
|
Some additional numbers from the various fixes to the uv run python reproduce_memory_leak.py --iterations 5baseline
All fixes
gc only
clear chunker manager cache
clear worker convert managers
The trend to note here is that once fixed, the memory for the server should reduce to around the same memory use from iteration 1. |
Description
This PR addresses several sources of memory leaks in the current
docling-serveimplementation caused by logic in this library. This is part of the investigation of docling-project/docling-serve#366 and docling-project/docling-serve#474./clear/resultsis invoked with the RQ engine, the content fromredisthat stored the job (including the full document contents) was not being deleted./clear/convertersis invoked, the local cache was cleared for the converters, but they were never garbage collected which could cause OOMs if enough other memory was allocated before garbage collection happened.use_shared_manageris set toFalse(the default), eachAsyncLocalWorkercreates its ownDoclingConverterManagerinstance which is not tracked by the parentLocalOrchestratorand therefore not cleared when/clear/convertersis invoked.process_chunk_resultsis called to process a CHUNK task, it creates an ephemeral instance of theDocumentChunkerManagerwhich in turn uses an@lru_cacheto store the internal chunker. These ephemeral instances go out of scope whenprocess_chunk_resultsterminates, but the purge semantics of the cached internal chunker are not clear, so it's possible that this was leaking cached instances.Further Investigation
The fixes in this PR should solve the memory leak issues IF (and only if) the user invokes both
/clear/resultsand/clear/convertersregularly. Memory will continue to accumulate if this doesn't happen. My current best guess as to why memory accumulates, despite the caching for the converters is related to docling-project/docling#2209 which suggests there may be memory leaks in the core ofdocling'sDocumentConverter. Without purging the converters regularly, the cached instances of the converters are free to leak and accumulate memory over time.Given this, I suspect that the RQ backend has not yet been fully solved since we don't have proper
clear_convertersimplementation in theRQOrchestratorwhich will likely lead the redis workers to accumulate over time.Testing
This fix was developed in conjunction with Claude Code and Bob which produced the following scripts to measure the memory leak:
reproduce_memory_leak.py
reproduce_memory_leak_redis.py