[FIX] reduce peak memory usage during single-cell extraction by sophiamaedler · Pull Request #384 · MannLabs/scPortrait

sophiamaedler · 2026-03-09T06:37:35Z

implemented automatic flushing and gc.collection after N batches to clean up caches
capped the maximum number of inflight results kept (i.e. computed results that have not yet been written to disk)

Copilot

Pull request overview

Reduces peak memory usage during multiprocessing single-cell extraction by streaming extraction results directly from the worker pool iterator instead of materializing the full result list in memory.

Changes:

Replace list(tqdm(pool.imap(...))) with direct iteration over tqdm(pool.imap(...)) to avoid accumulating all batch results at once.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

sophiamaedler · 2026-03-09T06:47:00Z

Did a quick benchmark on a small dataset which shows improved memory usage.

Will need to repeat on a larger dataset.

…params

sophiamaedler · 2026-03-14T22:53:28Z

I ran this on a larger example with the current implementation of the code vs the implementation in main:
extraction changed from [list(tqdm(pool.imap(...)))] to streamed [tqdm(pool.imap_unordered(...))].

This has improved run time but has not reduced memory-usage during the extraction process.

We also see a continuously increasing main Process memory requirements. Working theory is that:

Workers produce large batch results in parallel.
Main process is the single writer (_write_to_hdf5 under a lock).
If producer throughput > writer throughput, pending results accumulate in parent-side buffers/queues.
Parent RSS rises over time even without a classic leak.

sophiamaedler · 2026-03-15T00:36:36Z

I tested this theory by monitoring the time spent waiting for worker results versus the time spent writing each batch to HDF5 in the main process.

The results strongly support a writer backpressure bottleneck:

batches	avg_wait_for_result_s	avg_write_s	fast_fetch_fraction	write_wait_ratio
25	0.1445	0.9259	0.92	6.41
50	0.0729	0.8739	0.96	11.99
75	0.0493	0.8597	0.97	17.44
100	0.0374	0.8602	0.98	23.00
125	0.0302	0.8566	0.98	28.36

Interpretation:

avg_wait_for_result_s is low and keeps decreasing, so the main process rarely waits for workers.
avg_write_s stays much higher, so writing is the slow stage.
fast_fetch_fraction is very high (0.92-0.98), meaning results are usually immediately available.
write_wait_ratio increases over time, indicating the writer is increasingly the bottleneck relative to result availability.

Conclusion: workers are producing faster than the single-writer path can drain, which is consistent with the observed memory growth from queued/in-flight batch results.

… single-cell extraction

sophiamaedler · 2026-03-15T00:59:17Z

I instrumented HDF5 writing to break down per-batch time into:

file open + dataset lookup (avg_open_lookup_s)
actual write loop (avg_write_loop_s)
file close (avg_close_s)

Early results:

batches	avg_open_lookup_s	avg_write_loop_s	avg_close_s	avg_total_s
25	0.0011	1.0393	0.0206	1.0610
50	0.0011	1.0921	0.0165	1.1096
75	0.0009	1.0113	0.0160	1.0282
100	0.0009	0.9927	0.0160	1.0096
125	0.0017	1.0007	0.0152	1.0176
150	0.0016	1.0045	0.0156	1.0217

Interpretation:

Open/lookup is ~1 ms per batch.
Close is ~15-21 ms per batch.
The write loop is ~1.0 s per batch and dominates total time (>95%).

Conclusion:

The bottleneck is actual HDF5 dataset writes/compression, not file open/lookup overhead.
Increasing batch size to amortize open cost is unlikely to significantly improve throughput.
Optimization should target write-loop cost (compression/chunking/write pattern).

sophiamaedler · 2026-03-15T01:07:31Z

Limit the number of inflight-batches we keep (i.e. completed results waiting to be written to disk):

Limits number of in-flight returned result batches (pending_results) to a max N.
Submits new batch tasks only when one completed result is consumed/written.
Uses auto-calculated N by default based on RAM budget targeting 85% utilization (or user specified value)

Setting lower N values does indeed cap main RSS usage during cell-extraction step. Memory still spikes significantly at beginning when mapping input-images and masks to memory-mapped temp arrays.

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

src/scportrait/pipeline/extraction.py

…ocess

[FIX] reduce peak memory usage during single-cell extraction

2d7d49b

Copilot AI review requested due to automatic review settings March 9, 2026 06:37

Copilot started reviewing on behalf of sophiamaedler March 9, 2026 06:38 View session

Copilot AI reviewed Mar 9, 2026

View reviewed changes

sophiamaedler added 2 commits March 14, 2026 15:33

[FIX] bug in initialization of dummy segmentation method for getting …

dbd5496

…params

[FIX] stream extraction pool results via imap_unordered

75ba729

[FIX] fix serialization issue

75120ee

[LOG] improve log output to better represent what is happening during…

46ab3fc

… single-cell extraction

sophiamaedler added 4 commits March 15, 2026 15:20

[REFACTOR] cleanup setup logic

cfe92fa

[FEATURE] limit inflight batch number

34b13d6

[DOC] document new behaviour

9725e39

[DOC] fix rendering bug in docstring

98eddd8

sophiamaedler requested a review from Copilot March 15, 2026 18:32

Copilot started reviewing on behalf of sophiamaedler March 15, 2026 18:33 View session

Copilot AI reviewed Mar 15, 2026

View reviewed changes

src/scportrait/pipeline/extraction.py Outdated Show resolved Hide resolved

src/scportrait/pipeline/extraction.py Outdated Show resolved Hide resolved

src/scportrait/pipeline/extraction.py Show resolved Hide resolved

src/scportrait/pipeline/extraction.py Outdated Show resolved Hide resolved

sophiamaedler added 2 commits March 15, 2026 19:43

[FIX] address review comments

9967d13

[TEST] add unit tests for the memory calibration of the extraction pr…

9a50adf

…ocess

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FIX] reduce peak memory usage during single-cell extraction#384

[FIX] reduce peak memory usage during single-cell extraction#384
sophiamaedler wants to merge 11 commits intomainfrom
improve_mem_extraction

sophiamaedler commented Mar 9, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

sophiamaedler commented Mar 9, 2026

Uh oh!

sophiamaedler commented Mar 14, 2026 •

edited

Loading

Uh oh!

sophiamaedler commented Mar 15, 2026

Uh oh!

sophiamaedler commented Mar 15, 2026

Uh oh!

sophiamaedler commented Mar 15, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sophiamaedler commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

sophiamaedler commented Mar 9, 2026

Uh oh!

sophiamaedler commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sophiamaedler commented Mar 15, 2026

Uh oh!

sophiamaedler commented Mar 15, 2026

Uh oh!

sophiamaedler commented Mar 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sophiamaedler commented Mar 9, 2026 •

edited

Loading

sophiamaedler commented Mar 14, 2026 •

edited

Loading

sophiamaedler commented Mar 15, 2026 •

edited

Loading