Skip to content

Commit 31bc6c8

Browse files
committed
[OpenMP] Allocate depnodes on heap not stack in __kmpc_omp_taskwait_deps_51
This patch contains the fix for the work-stealing stack clobber with task graph dependency nodes. A straightforward-seeming fix is to just use heap allocation instead of stack allocation for the problematic nodes in question. This also allows us to revert the fix from PR85963: #85963 There may be some concern that despite fixing the issue, using heap rather than stack allocation may have a negative performance impact. However this does not appear to be the case: in fact I measured a small performance boost with this patch, i.e. over 100 runs of the OpenMP_VV test_taskwait_depend.c test with the iteration count N increased to 1024000, five times before & after the patch is applied: pre-patch ========= real 0m30.635s user 45m55.276s sys 1m24.537s real 0m30.943s user 47m5.018s sys 1m19.218s real 0m30.573s user 45m56.023s sys 1m21.273s real 0m30.955s user 47m6.567s sys 1m25.121s real 0m30.712s user 46m51.830s sys 1m22.085s with patch ========== real 0m28.245s user 39m16.057s sys 1m11.156s real 0m28.385s user 39m35.553s sys 1m16.413s real 0m28.421s user 39m42.498s sys 1m16.783s real 0m29.207s user 39m5.006s sys 1m10.247s real 0m29.118s user 38m30.484s sys 1m14.027s (On an AMD system with 64 cores/256 hw threads.) This is probably because we no longer need the wait loop in __kmpc_omp_taskwait_deps_51.
1 parent 812b492 commit 31bc6c8

File tree

1 file changed

+11
-14
lines changed

1 file changed

+11
-14
lines changed

openmp/runtime/src/kmp_taskdeps.cpp

Lines changed: 11 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1009,13 +1009,16 @@ void __kmpc_omp_taskwait_deps_51(ident_t *loc_ref, kmp_int32 gtid,
10091009
return;
10101010
}
10111011

1012-
kmp_depnode_t node = {0};
1013-
__kmp_init_node(&node);
1014-
#if KMP_DEBUG
1015-
node.dn.on_stack = thread;
1012+
#if USE_FAST_MEMORY
1013+
kmp_depnode_t *node =
1014+
(kmp_depnode_t *)__kmp_fast_allocate(thread, sizeof(kmp_depnode_t));
1015+
#else
1016+
kmp_depnode_t *node =
1017+
(kmp_depnode_t *)__kmp_thread_malloc(thread, sizeof(kmp_depnode_t));
10161018
#endif
1019+
__kmp_init_node(node);
10171020

1018-
if (!__kmp_check_deps(gtid, &node, NULL, &current_task->td_dephash,
1021+
if (!__kmp_check_deps(gtid, node, NULL, &current_task->td_dephash,
10191022
DEP_BARRIER, ndeps, dep_list, ndeps_noalias,
10201023
noalias_dep_list)) {
10211024
KA_TRACE(10, ("__kmpc_omp_taskwait_deps(exit): T#%d has no blocking "
@@ -1029,20 +1032,14 @@ void __kmpc_omp_taskwait_deps_51(ident_t *loc_ref, kmp_int32 gtid,
10291032

10301033
int thread_finished = FALSE;
10311034
kmp_flag_32<false, false> flag(
1032-
(std::atomic<kmp_uint32> *)&node.dn.npredecessors, 0U);
1033-
while (node.dn.npredecessors > 0) {
1035+
(std::atomic<kmp_uint32> *)&node->dn.npredecessors, 0U);
1036+
while (node->dn.npredecessors > 0) {
10341037
flag.execute_tasks(thread, gtid, FALSE,
10351038
&thread_finished USE_ITT_BUILD_ARG(NULL),
10361039
__kmp_task_stealing_constraint);
10371040
}
10381041

1039-
// Wait until the last __kmp_release_deps is finished before we free the
1040-
// current stack frame holding the "node" variable; once its nrefs count
1041-
// reaches 1, we're sure nobody else can try to reference it again.
1042-
kmp_int32 nrefs;
1043-
while ((nrefs = node.dn.nrefs) > 1)
1044-
KMP_YIELD(TRUE);
1045-
KMP_DEBUG_ASSERT(nrefs == 1);
1042+
__kmp_node_deref(thread, node);
10461043

10471044
#if OMPT_SUPPORT
10481045
__ompt_taskwait_dep_finish(current_task, taskwait_task_data);

0 commit comments

Comments
 (0)