[FIX] reduce peak memory usage during single-cell extraction#384
[FIX] reduce peak memory usage during single-cell extraction#384sophiamaedler wants to merge 11 commits intomainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Reduces peak memory usage during multiprocessing single-cell extraction by streaming extraction results directly from the worker pool iterator instead of materializing the full result list in memory.
Changes:
- Replace
list(tqdm(pool.imap(...)))with direct iteration overtqdm(pool.imap(...))to avoid accumulating all batch results at once.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
I tested this theory by monitoring the time spent waiting for worker results versus the time spent writing each batch to HDF5 in the main process. The results strongly support a writer backpressure bottleneck:
Interpretation:
Conclusion: workers are producing faster than the single-writer path can drain, which is consistent with the observed memory growth from queued/in-flight batch results. |
… single-cell extraction
|
I instrumented HDF5 writing to break down per-batch time into:
Early results:
Interpretation:
Conclusion:
|
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.



reported by @vvarlamova