Skip to content

BUG: Fix ParallelSparseFieldLevelSet deadlock with PlatformMultiThreader (supersedes #6321)#6337

Merged
hjmjohnson merged 2 commits into
InsightSoftwareConsortium:mainfrom
hjmjohnson:supersede-6321-pslsif-platform-multithreader
May 27, 2026
Merged

BUG: Fix ParallelSparseFieldLevelSet deadlock with PlatformMultiThreader (supersedes #6321)#6337
hjmjohnson merged 2 commits into
InsightSoftwareConsortium:mainfrom
hjmjohnson:supersede-6321-pslsif-platform-multithreader

Conversation

@hjmjohnson
Copy link
Copy Markdown
Member

Fixes the ParallelSparseFieldLevelSetImageFilter::Iterate() dispatch-starvation deadlock by forcing PlatformMultiThreader in the constructor (per @blowekamp's suggestion on #6321), rather than spawning explicit std::threads. Intended to supersede #6321.

⚠️ Stacked on #6320 (the robustness test). Merge #6320 first; this PR's only unique change is a 3-line constructor edit to itkParallelSparseFieldLevelSetImageFilter.hxx. After #6320 lands, rebase drops the test commit and the diff becomes just the fix.

Root cause and why this approach

Iterate() dispatches ThreadedApplyUpdate via mt->ParallelizeArray(); the body blocks in SignalNeighborsAndWait() until neighbor work units arrive. This neighbor-only protocol requires every work unit to hold its own concurrent OS thread for the full dispatch. Pool/TBB-backed ParallelizeArray queues N tasks onto a bounded worker set — CV-blocked tasks pin all workers, queued units never start, deadlock.

#6321 swapped only the one ParallelizeArray containing SignalNeighborsAndWait for explicit std::threads. @blowekamp noted that leaves other multi-threader/work-unit combinations potentially vulnerable, and suggested instead hard-coding PlatformMultiThreader to preserve the filter's original fixed-thread assumption — the same pattern itkSLICImageFilter uses:

this->DynamicMultiThreadingOff();
this->SetMultiThreader(PlatformMultiThreader::New());

PlatformMultiThreader::SingleMethodExecute spawns one dedicated OS thread per work unit (calling thread + N−1 spawned), and ParallelizeArrayHelper with range == work-unit count gives each thread exactly one index — so every ThreadedApplyUpdate/SignalNeighborsAndWait runs on its own concurrent thread. The neighbor-sync protocol and all data members are unchanged; the fix applies to every parallel section in the filter, not just one.

Local validation — 700 runs, 0 failures

Built with Module_ITKTBB=ON and exercised under both threaders:

Suite Threader Runs Result
Robustness GTest (SweepRepeat + Determinism + ConcurrentMultiPipeline) default (Pool) ×100 ✅ 100/100, 0 failures
Robustness GTest real TBB ×100 ✅ 100/100, 0 failures
itkParallelSparseFieldLevelSetImageFilterTest (baseline image compare) default ×100 ✅ 100/100, 0 failures

700 total test executions, zero failures, zero deadlocks. Output bit-identical every run (the Determinism scenario and the MD5-pinned baseline both pass). Full ninja build (3345 targets) clean; pre-commit run --all-files clean.

Note: the original deadlock is timing-/core-count-dependent (needs work units > available TBB workers). On the 48-core validation host m_NumOfWorkUnits caps to ≤48, so the committed sweep does not starve there; the deterministic 100%-deadlock reproduction is on a 16-core box (see #6321) and CI's TBB legs.

hjmjohnson and others added 2 commits May 23, 2026 18:05
ParallelSparseFieldLevelSetImageFilter::Iterate() dispatches its
per-iteration ThreadedApplyUpdate body via mt->ParallelizeArray().
The body invokes SignalNeighborsAndWait() several times, blocking on
per-thread std::condition_variables until both neighbor work units
have arrived at the same phase.  This neighbor-only protocol assumes
every work unit holds its own concurrent OS thread for the full
duration of the dispatch.  Pool/TBB-backed ParallelizeArray does not
honor this: N work units are submitted to a shared worker pool whose
size is bounded by core count, and CV-blocked tasks pin the pool so
queued work units never start and the dispatch deadlocks.

Reported as the intermittent timeout of
itkParallelSparseFieldLevelSetImageFilterTest in CDash test 2570265561
("Trying mf->Update()" followed by a 1000s timeout) on Windows
Release with TBB.

The existing test fails at ~3% per run under TBB, which is too rare
to catch the bug reliably in CI.  This commit adds a new test
itkParallelSparseFieldLevelSetImageFilterRobustnessTest that drives
the deadlock to a high per-test-invocation probability:

  Scenario 1 (sweep x repeat): cycle 1/2/4/8/11/16/32 work-unit
    counts for 20 outer iterations, totalling 140 short filter runs.
    Per-run pool-starvation deadlock probability is low but
    amplification gives a ~95% per-test-invocation hit rate.

  Scenario 2 (determinism): three repeated runs at workUnits=11 must
    produce bit-identical output.  Guards against future barrier
    rewrites that introduce numerical non-determinism.

  Scenario 3 (concurrent multi-pipeline): eight pipelines run
    simultaneously via std::async (88 work units competing for a
    core-count-bounded worker pool).  Cross-pipeline contention
    directly probes the dispatch-starvation deadlock path.

Local measurements on a 16-core x86 box with TBB:
  - Upstream code, 30 invocations: 30 timeouts (100% deadlock).
  - Upstream code, POOL/PLATFORM: pass (no worker shortage possible).
  - Test runtime with a working synchronization: ~5s wallclock.
  - CTest TIMEOUT set to 30 (6x leeway over expected runtime).

This commit intentionally introduces a failing test on master.  The
proposed fix lands in a follow-up PR.
Iterate()'s ThreadedApplyUpdate dispatch blocks in SignalNeighborsAndWait
until neighbor work units arrive, so every work unit needs its own
concurrent OS thread. Pool/TBB ParallelizeArray queues N tasks onto a
bounded worker set; CV-blocked tasks pin the workers so queued units
never start, deadlocking.

Force PlatformMultiThreader with DynamicMultiThreadingOff in the
constructor (as SLICImageFilter does) so every work unit gets a dedicated
OS thread. The neighbor-only sync protocol is unchanged.

Supersedes InsightSoftwareConsortium#6321, which spawned explicit std::threads for only one
ParallelizeArray call; forcing the threader preserves the filter's
fixed thread==work-unit assumption across all parallel sections.

Co-Authored-By: Bradley Lowekamp <blowekamp@mail.nih.gov>
@github-actions github-actions Bot added type:Bug Inconsistencies or issues which will cause an incorrect result under some or all circumstances type:Infrastructure Infrastructure/ecosystem related changes, such as CMake or buildbots type:Testing Ensure that the purpose of a class is met/the results on a wide set of test cases are correct area:Segmentation Issues affecting the Segmentation module labels May 26, 2026
@blowekamp blowekamp self-requested a review May 26, 2026 20:54
Copy link
Copy Markdown
Member

@blowekamp blowekamp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hjmjohnson hjmjohnson marked this pull request as ready for review May 27, 2026 01:34
@hjmjohnson hjmjohnson merged commit fc15a07 into InsightSoftwareConsortium:main May 27, 2026
17 of 20 checks passed
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 27, 2026

Greptile Summary

Fixes the ParallelSparseFieldLevelSetImageFilter pool-starvation deadlock by forcing PlatformMultiThreader in the constructor — the same pattern used by itkSLICImageFilter — so every work unit gets its own dedicated OS thread, satisfying the neighbor-blocking inter-phase sync invariant.

  • itkParallelSparseFieldLevelSetImageFilter.hxx: Three-line constructor addition (DynamicMultiThreadingOff() + SetMultiThreader(PlatformMultiThreader::New())) replaces the pool-backed default threader with one that spawns one OS thread per work unit.
  • itkParallelSparseFieldLevelSetImageFilterRobustnessGTest.cxx: New GTest suite with SweepRepeat, Determinism, and ConcurrentMultiPipeline scenarios; uses a watchdog thread with std::abort() to turn a potential deadlock-hang into a timed test failure.
  • CMakeLists.txt: Registers the new GTest binary via creategoogletestdriver; no explicit itk_add_test entries needed (auto-discovery handles registration).

Confidence Score: 3/5

The constructor fix is minimal and well-precedented; the main risk is in the new test infrastructure where an uncaught ITK exception would crash the entire GTest binary.

The three-line constructor change is clean and correct. The robustness test suite introduces a runWithDeadline helper that is not exception-safe: an uncaught ITK exception from body() leaves the watchdog thread unjoinable, causing std::terminate() to abort the test driver and silently drop all remaining test results.

itkParallelSparseFieldLevelSetImageFilterRobustnessGTest.cxx — the runWithDeadline exception-safety gap needs attention before CI can trust the test results.

Important Files Changed

Filename Overview
Modules/Segmentation/LevelSets/include/itkParallelSparseFieldLevelSetImageFilter.hxx 3-line constructor change: DynamicMultiThreadingOff() + SetMultiThreader(PlatformMultiThreader::New()) fixes the pool-starvation deadlock; matches the established itkSLICImageFilter pattern. Public SetMultiThreader on the base class could silently undo the fix if called post-construction.
Modules/Segmentation/LevelSets/test/itkParallelSparseFieldLevelSetImageFilterRobustnessGTest.cxx New GTest suite with SweepRepeat, Determinism, and ConcurrentMultiPipeline scenarios. runWithDeadline is not exception-safe: an uncaught ITK exception from body() leaves the watchdog thread unjoinable, causing std::terminate() to kill the entire test binary.
Modules/Segmentation/LevelSets/test/CMakeLists.txt Adds creategoogletestdriver block for the new GTest file; follows existing ITK conventions with no explicit itk_add_test calls needed (gtest_discover_tests handles registration). Clean change.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant Filter as ParallelSparseFieldLevelSetImageFilter
    participant PMT as PlatformMultiThreader
    participant T1 as WorkUnit Thread 1..N

    Caller->>Filter: New() [constructor]
    Filter->>Filter: DynamicMultiThreadingOff()
    Filter->>PMT: New()
    Filter->>Filter: SetMultiThreader(PMT)
    Caller->>Filter: Update()
    Filter->>PMT: ParallelizeArray(ThreadedApplyUpdate)
    PMT->>T1: Spawn N dedicated OS threads
    T1->>T1: ThreadedApplyUpdate + SignalNeighborsAndWait
    Note over T1: All N threads run concurrently
    T1-->>PMT: All threads complete
    PMT-->>Filter: Return
    Filter-->>Caller: Output ready
Loading

Reviews (1): Last reviewed commit: "BUG: Fix ParallelSparseFieldLevelSet dea..." | Re-trigger Greptile

Comment on lines +49 to +71
template <typename TFunctor>
void
runWithDeadline(unsigned int seconds, TFunctor && body)
{
std::mutex m;
std::condition_variable cv;
bool done = false;
std::thread watchdog([&] {
std::unique_lock<std::mutex> lk(m);
if (!cv.wait_for(lk, std::chrono::seconds(seconds), [&] { return done; }))
{
std::cerr << "Deadline exceeded (" << seconds << "s): ParallelizeArray dispatch deadlock\n";
std::abort();
}
});
body();
{
const std::lock_guard<std::mutex> lk(m);
done = true;
}
cv.notify_all();
watchdog.join();
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 runWithDeadline terminates the test driver on any pipeline exception

If body() throws (e.g., an itk::ExceptionObject from mf->Update()), the code after body() is never reached — done is never set, cv.notify_all() is never called, and watchdog.join() is never called. The watchdog destructor then invokes std::terminate() because the thread is still joinable, which kills the entire GTest binary. All subsequent test cases (Determinism, ConcurrentMultiPipeline) are silently lost rather than reported. A try/catch around body() that signals done and joins before rethrowing would let GTest report the failure normally.

Suggested change
template <typename TFunctor>
void
runWithDeadline(unsigned int seconds, TFunctor && body)
{
std::mutex m;
std::condition_variable cv;
bool done = false;
std::thread watchdog([&] {
std::unique_lock<std::mutex> lk(m);
if (!cv.wait_for(lk, std::chrono::seconds(seconds), [&] { return done; }))
{
std::cerr << "Deadline exceeded (" << seconds << "s): ParallelizeArray dispatch deadlock\n";
std::abort();
}
});
body();
{
const std::lock_guard<std::mutex> lk(m);
done = true;
}
cv.notify_all();
watchdog.join();
}
try
{
body();
}
catch (...)
{
{
const std::lock_guard<std::mutex> lk(m);
done = true;
}
cv.notify_all();
watchdog.join();
throw;
}
{
const std::lock_guard<std::mutex> lk(m);
done = true;
}
cv.notify_all();
watchdog.join();
}

Comment on lines +128 to +130
// Neighbor-blocking inter-phase sync requires a dedicated OS thread per work unit.
this->DynamicMultiThreadingOff();
this->SetMultiThreader(PlatformMultiThreader::New());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Post-construction SetMultiThreader silently re-introduces deadlock

SetMultiThreader is public on the base class, so a caller who runs filter->SetMultiThreader(TBBMultiThreader::New()) after construction replaces PlatformMultiThreader with a pool-backed threader and restores exactly the dispatch-starvation deadlock this PR fixes. Since the filter's synchronisation protocol mandates one dedicated OS thread per work unit, the constructor should at minimum document that overriding the threader breaks the filter — or note it in the class-level Doxygen rather than only in a constructor comment.

Comment on lines +23 to +27

#include "gtest/gtest.h"

#include "itkLevelSetFunction.h"
#include "itkParallelSparseFieldLevelSetImageFilter.h"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 File-level comment references a transient resource and exceeds the prose-budget cap

prose-budget.md caps in-source comments at one short line (≤ 100 chars) and explicitly forbids references to the PR ("See the PR for the full analysis") because the PR is a transient resource — "the link rots; the prose remains as noise." The second sentence and the "See the PR" cross-reference should be removed; the surviving single line should capture only the non-obvious invariant that the reader needs with the diff open.

Context Used: AGENTS.md (source)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:Segmentation Issues affecting the Segmentation module type:Bug Inconsistencies or issues which will cause an incorrect result under some or all circumstances type:Infrastructure Infrastructure/ecosystem related changes, such as CMake or buildbots type:Testing Ensure that the purpose of a class is met/the results on a wide set of test cases are correct

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants