Skip to content

Comments

Fix ResourcesAggregator deadlock with virtual thread executor#6840

Merged
pditommaso merged 1 commit intomasterfrom
fix-resources-aggregator-deadlock
Feb 18, 2026
Merged

Fix ResourcesAggregator deadlock with virtual thread executor#6840
pditommaso merged 1 commit intomasterfrom
fix-resources-aggregator-deadlock

Conversation

@pditommaso
Copy link
Member

Summary

Fix a deadlock in ResourcesAggregator.computeSummaryMap() that causes Nextflow to hang indefinitely during shutdown when generating the execution report.

Problem

ResourcesAggregator used the shared session ExecutorService — a ThreadPerTaskExecutor backed by virtual threads — to parallelize summary stats computation via invokeAll().

When other virtual threads (e.g. PublishDir file transfers) saturate or pin all ForkJoinPool carrier threads, the new tasks submitted by invokeAll can never be scheduled. Since invokeAll blocks until all futures complete, this creates a starvation deadlock: the main thread waits forever at computeSummaryMap() during onFlowComplete.

Observed in production — thread dump shows the main thread stuck at:

at java.util.concurrent.ThreadPerTaskExecutor.invokeAll(ThreadPerTaskExecutor.java:365)
at nextflow.trace.ResourcesAggregator.computeSummaryMap(ResourcesAggregator.groovy:75)
at nextflow.trace.ResourcesAggregator.computeSummaryList(ResourcesAggregator.groovy:90)
at nextflow.trace.ResourcesAggregator.renderSummaryJson(ResourcesAggregator.groovy:105)
at nextflow.trace.ReportObserver.renderSummaryJson(ReportObserver.groovy:201)
...
at nextflow.Session.shutdown0(Session.groovy:777)

This is related to but independent of #6833 — that PR fixes one cause of carrier thread saturation (S3 delete), but ResourcesAggregator is vulnerable to any scenario where carrier threads are busy.

Fix

Replace the shared session virtual thread executor with a dedicated FixedThreadPool scoped to the computeSummaryMap() call. This is CPU-bound computation that:

  • Benefits from platform threads (no I/O waiting)
  • Must not compete with virtual threads for carrier thread scheduling
  • Only runs once during shutdown, so the pool creation overhead is negligible

This also removes the Session dependency from ResourcesAggregator, simplifying both the class and its tests.

Test plan

  • Existing ResourcesAggregatorTest tests pass (summary computation, JSON rendering, insertion order)
  • ReportObserverTest passes
  • nf-tower plugin compiles cleanly

🤖 Generated with Claude Code

Use a dedicated fixed thread pool in computeSummaryMap() instead of
the shared session executor to prevent starvation deadlock when
virtual thread carrier threads are saturated.

Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
@netlify
Copy link

netlify bot commented Feb 18, 2026

Deploy Preview for nextflow-docs-staging canceled.

Name Link
🔨 Latest commit 9a3de5c
🔍 Latest deploy log https://app.netlify.com/projects/nextflow-docs-staging/deploys/69956b2c3bd77e0008c0e0f2

Copy link
Member

@bentsherman bentsherman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The analysis is slightly off -- PublishDir doesn't use the main session execService to publish files -- but I think the change is still reasonable

@pditommaso
Copy link
Member Author

I think it get confused with the similar one #6833, but yes, it's correct

@pditommaso pditommaso merged commit 7ac4f3e into master Feb 18, 2026
25 checks passed
@pditommaso pditommaso deleted the fix-resources-aggregator-deadlock branch February 18, 2026 16:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants