Skip to content

Commit 812b492

Browse files
committed
[OpenMP] Fix work-stealing stack clobber with taskwait
This patch series demonstrates and fixes a bug that causes crashes with OpenMP 'taskwait' directives in heavily multi-threaded scenarios. The implementation of taskwait builds a graph of dependency nodes for tasks. Some of those dependency nodes (kmp_depnode_t) are allocated on the stack, and some on the heap. In the former case, the stack is specific to a given thread, and the task associated with the node is initially bound to the same thread. This works as long as there is a 1:1 mapping between tasks and the per-thread stack. However, kmp_tasking.cpp:__kmp_execute_tasks_template implements a work-stealing algorithm that can take some task 'T1' from some thread's ready queue (say, thread 'A'), and execute them on another thread (say, thread 'B'). If that happens, task T1 may have a dependency node on thread A's stack, and that will *not* be moved to thread B's stack when the work-stealing takes place. Now, in a heavily multi-threaded program, *another* task, T2, can be invoked on thread 'A', re-using the stack slot for thread A at the same time that T1 is using the same slot from thread 'B'. This leads to random crashes, often (but not always) during dependency-node cleanup (__kmp_release_deps). This first patch adds some instrumentation to make it more obvious when the 1:1 mapping between tasks and thread stacks is violated. Typical output will be (heavily truncated): on-stack depnode moved from thread 0x5631d7d5c200 to thread 0x5631d7e15c00 Assertion failure at kmp_taskdeps.h(37): !node->dn.on_stack. Adding debug output also affects the timing such that the bug shows up much more frequently, at least on the system I'm working on.
1 parent 4eab219 commit 812b492

File tree

3 files changed

+19
-1
lines changed

3 files changed

+19
-1
lines changed

openmp/runtime/src/kmp.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2556,6 +2556,9 @@ typedef struct kmp_base_depnode {
25562556
#endif
25572557
std::atomic<kmp_int32> npredecessors;
25582558
std::atomic<kmp_int32> nrefs;
2559+
#if KMP_DEBUG
2560+
void *on_stack;
2561+
#endif
25592562
} kmp_base_depnode_t;
25602563

25612564
union KMP_ALIGN_CACHE kmp_depnode {

openmp/runtime/src/kmp_taskdeps.cpp

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,9 @@ static void __kmp_init_node(kmp_depnode_t *node) {
4848
#if USE_ITT_BUILD && USE_ITT_NOTIFY
4949
__itt_sync_create(node, "OMP task dep node", NULL, 0);
5050
#endif
51+
#if KMP_DEBUG
52+
node->dn.on_stack = NULL;
53+
#endif
5154
}
5255

5356
static inline kmp_depnode_t *__kmp_node_ref(kmp_depnode_t *node) {
@@ -1008,6 +1011,9 @@ void __kmpc_omp_taskwait_deps_51(ident_t *loc_ref, kmp_int32 gtid,
10081011

10091012
kmp_depnode_t node = {0};
10101013
__kmp_init_node(&node);
1014+
#if KMP_DEBUG
1015+
node.dn.on_stack = thread;
1016+
#endif
10111017

10121018
if (!__kmp_check_deps(gtid, &node, NULL, &current_task->td_dephash,
10131019
DEP_BARRIER, ndeps, dep_list, ndeps_noalias,
@@ -1033,8 +1039,10 @@ void __kmpc_omp_taskwait_deps_51(ident_t *loc_ref, kmp_int32 gtid,
10331039
// Wait until the last __kmp_release_deps is finished before we free the
10341040
// current stack frame holding the "node" variable; once its nrefs count
10351041
// reaches 1, we're sure nobody else can try to reference it again.
1036-
while (node.dn.nrefs > 1)
1042+
kmp_int32 nrefs;
1043+
while ((nrefs = node.dn.nrefs) > 1)
10371044
KMP_YIELD(TRUE);
1045+
KMP_DEBUG_ASSERT(nrefs == 1);
10381046

10391047
#if OMPT_SUPPORT
10401048
__ompt_taskwait_dep_finish(current_task, taskwait_task_data);

openmp/runtime/src/kmp_taskdeps.h

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,12 +22,19 @@ static inline void __kmp_node_deref(kmp_info_t *thread, kmp_depnode_t *node) {
2222
if (!node)
2323
return;
2424

25+
#if KMP_DEBUG
26+
if (node->dn.on_stack && node->dn.on_stack != thread)
27+
fprintf (stderr, "on-stack depnode moved from thread %p to thread %p\n",
28+
node->dn.on_stack, thread);
29+
#endif
30+
2531
kmp_int32 n = KMP_ATOMIC_DEC(&node->dn.nrefs) - 1;
2632
KMP_DEBUG_ASSERT(n >= 0);
2733
if (n == 0) {
2834
#if USE_ITT_BUILD && USE_ITT_NOTIFY
2935
__itt_sync_destroy(node);
3036
#endif
37+
KMP_DEBUG_ASSERT(!node->dn.on_stack);
3138
KMP_ASSERT(node->dn.nrefs == 0);
3239
#if USE_FAST_MEMORY
3340
__kmp_fast_free(thread, node);

0 commit comments

Comments
 (0)