Commit 812b492
committed
[OpenMP] Fix work-stealing stack clobber with taskwait
This patch series demonstrates and fixes a bug that causes crashes with
OpenMP 'taskwait' directives in heavily multi-threaded scenarios.
The implementation of taskwait builds a graph of dependency nodes for
tasks. Some of those dependency nodes (kmp_depnode_t) are allocated
on the stack, and some on the heap. In the former case, the stack is
specific to a given thread, and the task associated with the node is
initially bound to the same thread. This works as long as there is a
1:1 mapping between tasks and the per-thread stack.
However, kmp_tasking.cpp:__kmp_execute_tasks_template implements a
work-stealing algorithm that can take some task 'T1' from some thread's
ready queue (say, thread 'A'), and execute them on another thread (say,
thread 'B').
If that happens, task T1 may have a dependency node on thread A's stack,
and that will *not* be moved to thread B's stack when the work-stealing
takes place.
Now, in a heavily multi-threaded program, *another* task, T2, can be
invoked on thread 'A', re-using the stack slot for thread A at the same
time that T1 is using the same slot from thread 'B'. This leads to
random crashes, often (but not always) during dependency-node cleanup
(__kmp_release_deps).
This first patch adds some instrumentation to make it more obvious when
the 1:1 mapping between tasks and thread stacks is violated. Typical
output will be (heavily truncated):
on-stack depnode moved from thread 0x5631d7d5c200 to thread 0x5631d7e15c00
Assertion failure at kmp_taskdeps.h(37): !node->dn.on_stack.
Adding debug output also affects the timing such that the bug shows up
much more frequently, at least on the system I'm working on.1 parent 4eab219 commit 812b492
3 files changed
+19
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2556 | 2556 | | |
2557 | 2557 | | |
2558 | 2558 | | |
| 2559 | + | |
| 2560 | + | |
| 2561 | + | |
2559 | 2562 | | |
2560 | 2563 | | |
2561 | 2564 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
48 | 48 | | |
49 | 49 | | |
50 | 50 | | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
51 | 54 | | |
52 | 55 | | |
53 | 56 | | |
| |||
1008 | 1011 | | |
1009 | 1012 | | |
1010 | 1013 | | |
| 1014 | + | |
| 1015 | + | |
| 1016 | + | |
1011 | 1017 | | |
1012 | 1018 | | |
1013 | 1019 | | |
| |||
1033 | 1039 | | |
1034 | 1040 | | |
1035 | 1041 | | |
1036 | | - | |
| 1042 | + | |
| 1043 | + | |
1037 | 1044 | | |
| 1045 | + | |
1038 | 1046 | | |
1039 | 1047 | | |
1040 | 1048 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
25 | 31 | | |
26 | 32 | | |
27 | 33 | | |
28 | 34 | | |
29 | 35 | | |
30 | 36 | | |
| 37 | + | |
31 | 38 | | |
32 | 39 | | |
33 | 40 | | |
| |||
0 commit comments