Skip to content

Conversation

@jpinot
Copy link
Contributor

@jpinot jpinot commented Jul 28, 2025

This commit resolves a datarace that could occur when taskgraph with dependencies was enabled alongside task stealing. The problem occur because, during task stealing, the stealing thread could incorrectly add an extra node to the task dependency graph, while the original thread might fail to create the correct node, leading to a corrupted graph and potential execution issues.

EDIT 07/10/2025:
The above message is incorrect. The issue comes from the same thread setting a successors node to a node that has already been added, meanwhile the predecessor task have been executed and their dependencies released. Leaving the successor node waiting for the already executed task:

Thread 0: sets Task1 as successor of Task0
Thread 1: execute Task0
Thread 1: release successors of Task0
Thread 0: re-sets Task1 as successor of Task0
....
// Hangs due to Task1 not being executed because the Task0 dependency was re-set as predecessor after it was executed and their successors released.

Without taskgraph, the nodes are removed once executed; in contrast, when taskgraph is used, the nodes are maintained during and after the execution of the entire region. kmp_tasking_flags_t::onced is used to mark a task as executed, but it was incorrectly set. This commits solve the issue setting onced before releasing the dependencies and checks for the same flag when linking(__kmp_depnode_link_successor).

@jprotze
Copy link
Collaborator

jprotze commented Jul 28, 2025

In which case can a different thread insert a node into the dependency graph? The graph should contain only sibling tasks, so how can a different thread create a task that is sibling to tasks in the taskgraph?
Is the problem, that the stealing thread removes nodes from the DAG while the taskgraph is generated?

jpinot and others added 3 commits October 7, 2025 12:21
This commit resolves a datarace that could occur when taskgraph with
dependencies was enabled alongside task stealing. The problem occur
because, during task stealing, the stealing thread could incorrectly add
an extra node to the task dependency graph, while the original thread
might fail to create the correct node, leading to a corrupted graph and
potential execution issues.
Fix race condition when the same thread re-set a successor node for a
task that had already completed execution and released its dependencies.
This caused the successor to remain waiting indefinitely:

Thread 0 sets Task1 as successor of Task0
Thread 1 executes Task0 and releases successors
Thread 0 re-sets Task1 as successor of Task0
-> Task1 never executes because Task0 was already completed
@jpinot jpinot force-pushed the openmp_taskgraph_extra_node branch from cd9a919 to f597923 Compare October 7, 2025 10:25
@jpinot
Copy link
Contributor Author

jpinot commented Oct 7, 2025

In which case can a different thread insert a node into the dependency graph? The graph should contain only sibling tasks, so how can a different thread create a task that is sibling to tasks in the taskgraph? Is the problem, that the stealing thread removes nodes from the DAG while the taskgraph is generated?

Description was wrong, I updated the PR comment and code 😃

@jpinot jpinot changed the title Draft: [OpenMP] Fix datarace with taskgraph dependencies and task stealing [OpenMP] Fix race condition in taskgraph with task dependencies Oct 7, 2025
@jpinot jpinot marked this pull request as ready for review October 7, 2025 12:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants