Skip to content

Conversation

@jpinot
Copy link
Contributor

@jpinot jpinot commented Jul 28, 2025

This patch addresses an issue where the td_tdg_task_id could underflow, leading to a negative task ID, when a taskloop region was encountered before a taskgraph clause.

Previously, td_tdg_task_id was decremented unconditionally within a taskloop, even if that taskloop was not part of an active taskgraph. This caused problems in scenarios like the following:

#pragma omp parallel
#pragma omp single
{
// Here, td_tdg_task_id is incorrectly decremented, potentially becoming negative.
#pragma omp taskloop
  for (int i = 0; i < n; i++) {
    initialize(data[i]);
  }
}

#pragma omp parallel
#pragma omp single
{
#pragma omp taskgraph
{
// When this taskgraph is entered, the recording of tasks may fail (e.g., segfault)
// due to the invalid (negative) td_tdg_task_id.
#pragma omp taskloop
  for (int i = 0; i < num_iter;  i++) {
    compute();
  }
}
}

EDIT(18/09/2025):
This implementation:

  • Allows sporadic holes in the record_map: The record_map, an array of recorded tasks in a task graph, can now have sporadic holes. This change removes the need for a decrement operation and centralizes the control of the index/ID within each task dependence graph (TDG) by moving the count for the next index inside the kmp_node_info structure.
  • Removes extra allocation of kmp_node_info successors. Optimizing memory usage by delaying the allocation of kmp_node_info successors until they are actually needed.
  • Fixes a data race when resizing the record_map.

@llvmbot llvmbot added the openmp:libomp OpenMP host runtime label Jul 28, 2025
@jpinot jpinot marked this pull request as draft July 28, 2025 05:10
@jpinot jpinot changed the title [wip][openmp] Fix segfault when taskloop before taskgraph Draft: [OpenMP] Fix td_tdg_task_id underflow with taskloop and taskgraph Jul 28, 2025
@jpinot jpinot force-pushed the openmp_taskgraph_taskloop branch from 72f5101 to c53fcbe Compare July 28, 2025 05:33
@jprotze
Copy link
Collaborator

jprotze commented Jul 28, 2025

What is the purpose of the decrement?

@jpinot
Copy link
Contributor Author

jpinot commented Jul 28, 2025

What is the purpose of the decrement?

Currently, when taskloop construct is invoked, clang generates a runtime call to __kmpc_omp_task_alloc(), which creates a task. This task is then duplicated when __kmp_taskloop is call (were td_tdg_task_id is finally set). The decrement "resets" the value to the inital value (before call to __kmpc_omp_task_alloc).

The PR is still in Draft because this solution does not fit cases where multiple taskgraphs are been recorded at the same time. I want to move the td_tdg_task_id inside the kmp_taskdata( something like kmp_tdg_task_id_next) to be unique for each taskgraph (Note that after recording, td_tdg_task_id is beeing reset to 0 ).

@jprotze Do you have any suggestions?

@github-actions
Copy link

github-actions bot commented Jul 30, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@jprotze
Copy link
Collaborator

jprotze commented Jul 30, 2025

Even with a localized counter as implemented with your latest commit, I think this decrement can be problematic.

Is it really important to avoid spurious holes in the numbering of task IDs? As I understand, this is just used for accessing the record_map entries (which is actually a vector, not a map). Checking the code, I realized that for resizing the vector, the code will do a lot of rather small allocs for successorList, that might never be used. I would recommend to delay these allocs until the entry is really used.

A bigger issue is, that the accesses to the record_map are not protected. If one thread triggers the resize code for the vector, other threads might still access the vector and writes might get lost.

@jpinot
Copy link
Contributor Author

jpinot commented Jul 31, 2025

Even with a localized counter as implemented with your latest commit, I think this decrement can be problematic.

Is it really important to avoid spurious holes in the numbering of task IDs? As I understand, this is just used for accessing the record_map entries (which is actually a vector, not a map). Checking the code, I realized that for resizing the vector, the code will do a lot of rather small allocs for successorList, that might never be used. I would recommend to delay these allocs until the entry is really used.

A bigger issue is, that the accesses to the record_map are not protected. If one thread triggers the resize code for the vector, other threads might still access the vector and writes might get lost.

Yes, the decrement is bit of a haky solution. The problem is that the td_tdg_task_id is also use as a record_map index , Conesquently, (1) need to be sequential from 0 for each tdg, and (2) free of any spurious holes; in current implementation (could be converted to a proper map and hash the tasks id's).

Regarding the record_map not being protected, I already have changes for this(protecting the resize of record_map and successors) and will try to create a draft pull request today so you could also have a look.

I'll also take a look at the successorList as you suggested.

@jprotze
Copy link
Collaborator

jprotze commented Jul 31, 2025

I think, the number of places where you iterate over all records is limited. In these places, you could simply check for record_map[i].task != NULL and skip the spurious holes in the vector of records. In other places where the record map is used to find successors, you can still blindly access the map entries.

@jpinot jpinot force-pushed the openmp_taskgraph_taskloop branch from 81ef84f to 8c5b124 Compare September 9, 2025 09:49
@jpinot
Copy link
Contributor Author

jpinot commented Sep 9, 2025

I think, the number of places where you iterate over all records is limited. In these places, you could simply check for record_map[i].task != NULL and skip the spurious holes in the vector of records. In other places where the record map is used to find successors, you can still blindly access the map entries.

Thank you for the feedback and sorry for the delayed response.
I agree that allowing spurious holes in the record_map simplifies the logic. I’ve updated the branch to incorporate your suggestion. I’ve also added the extra locking mechanism for resizing record_map and successors, and removed the premature allocation of successors. My preliminary tests have been successful.

@jprotze, what are your thoughts?

This patch addresses an issue where the td_tdg_task_id could underflow,
leading to a negative task ID, when a taskloop region was encountered
before a taskgraph clause.

This change allows surious holes in the record_map.
Delayed allocation of successors in kmp_node until the array is needed,
removing the small allocation when a taskgraph node is created or
resized.
@jpinot jpinot force-pushed the openmp_taskgraph_taskloop branch from 8c5b124 to bf1f0df Compare September 18, 2025 08:50
@jpinot jpinot changed the title Draft: [OpenMP] Fix td_tdg_task_id underflow with taskloop and taskgraph [OpenMP] Fix td_tdg_task_id underflow when taskloop and taskgraph Sep 18, 2025
@jpinot jpinot requested a review from jprotze September 18, 2025 09:11
@jpinot jpinot self-assigned this Sep 18, 2025
@jpinot jpinot requested a review from shiltian September 18, 2025 09:11
@jpinot jpinot marked this pull request as ready for review September 18, 2025 09:11
@jprotze
Copy link
Collaborator

jprotze commented Sep 18, 2025

The record_map accesses in __kmp_track_dependence are not protected for a concurrent reallocation of the vector. I didn't check all the other accesses to the record_map, whether they might be concurrent to a reallocation.

I think there are two options:

  • lock any access to record_map that is possibly concurrent to a write/push (probably way too expensive)
  • store a pointer to the node_info in kmp_taskdata_t: task->tdg_node_info = task->tdg->record_map[task->td_tdg_task_id]

@jprotze
Copy link
Collaborator

jprotze commented Sep 18, 2025

Another issue, not directly connected to this PR: I think, the implementation of __kmp_track_dependence is problematic. I think, you will miss dependencies between tasks, if the source task completed execution before the sink task gets created (which is a similar problem for OMPT tools relying on ompt_callback_task_dependence to record a task dependence graph)

@jpinot
Copy link
Contributor Author

jpinot commented Sep 19, 2025

The record_map accesses in __kmp_track_dependence are not protected for a concurrent reallocation of the vector. I didn't check all the other accesses to the record_map, whether they might be concurrent to a reallocation.

I think there are two options:

* lock any access to `record_map` that is possibly concurrent to a write/push (probably way too expensive)

* store a pointer to the node_info in kmp_taskdata_t: `task->tdg_node_info = task->tdg->record_map[task->td_tdg_task_id]`

Yes, you're right. I agree that locking all read/write interactions with record_map would be too expensive.
The problem with your second suggestion (storing a pointer to the node in taskdata) is that record_map is an array of kmp_node_info_t. When the record_map vector is reallocated(resize), the old address is freed, and the pointer would point to a deleted memory location.

Two possible solutions come to mind:

  • Change the vector to a linked list. This would solve the pointer invalidation issue but could decrease efficiency during linear traversals, e.g: when executing a recorded TDG.
  • Use a multi-dimensional array or a similar data structure that avoids frequent reallocation, thus preventing the addresses from changing.

@jprotze any thoughts?

@jprotze
Copy link
Collaborator

jprotze commented Sep 19, 2025

Right, I missed, that record_map is essentially of type vector<kmp_node_info_t>, not vector<kmp_node_info_t *>.

Another possible approach would be to use a customized vector which allocates fixed_sized blocks as necessary (instead of realloc) and redirects operator[](i) to look at the i % fixed_sized element of block i / fixed_sized.

@jpinot jpinot force-pushed the openmp_taskgraph_taskloop branch from f76881e to b151a84 Compare September 29, 2025 18:23
@jpinot
Copy link
Contributor Author

jpinot commented Sep 29, 2025

Right, I missed, that record_map is essentially of type vector<kmp_node_info_t>, not vector<kmp_node_info_t *>.

Another possible approach would be to use a customized vector which allocates fixed_sized blocks as necessary (instead of realloc) and redirects operator[](i) to look at the i % fixed_sized element of block i / fixed_sized.

I have pushed a WIP commit with the changes discussed. However, a data race occurs when thread A reallocates blocks (the array of blocks; note that the actual blocks are never resized) to increase the number of blocks, and thread B calls kmp_node_vector_get to access an existing node. The returned pointer points to invalid data. I suspect this could be due to a compiler optimization or an issue with the return value handling(?).

@jprotze, do you know what might be causing this?

@jpinot jpinot requested a review from jtb20 September 30, 2025 07:14
@jpinot
Copy link
Contributor Author

jpinot commented Sep 30, 2025

Right, I missed, that record_map is essentially of type vector<kmp_node_info_t>, not vector<kmp_node_info_t *>.
Another possible approach would be to use a customized vector which allocates fixed_sized blocks as necessary (instead of realloc) and redirects operator[](i) to look at the i % fixed_sized element of block i / fixed_sized.

I have pushed a WIP commit with the changes discussed. However, a data race occurs when thread A reallocates blocks (the array of blocks; note that the actual blocks are never resized) to increase the number of blocks, and thread B calls kmp_node_vector_get to access an existing node. The returned pointer points to invalid data. I suspect this could be due to a compiler optimization or an issue with the return value handling(?).

@jprotze, do you know what might be causing this?

My bad, the problem was that thread B was holding the freed old_blocks pointer in kmp_node_vector_get 🙀, I'll clean the changes and push as soon as I re-test everything.

@jpinot jpinot requested a review from kparzysz September 30, 2025 07:16
jpinot added 3 commits October 1, 2025 15:45
Add pointer to node represeting the task in the TDG, the way avoids
locking access to record_map every time a node need to be accessed.
Replaced the fixed-size array for TDG successors with kmp_node_vector,
a custom dynamic vector of kmp_node_info to TDG nodes. This change aims
to mitigate data races during vector resizing by using a block-based
allocation strategy.
@jpinot jpinot force-pushed the openmp_taskgraph_taskloop branch from b151a84 to 32edeb6 Compare October 1, 2025 14:02
@jpinot
Copy link
Contributor Author

jpinot commented Oct 1, 2025

Right, I missed, that record_map is essentially of type vector<kmp_node_info_t>, not vector<kmp_node_info_t *>.

Another possible approach would be to use a customized vector which allocates fixed_sized blocks as necessary (instead of realloc) and redirects operator[](i) to look at the i % fixed_sized element of block i / fixed_sized.

As discussed, the latest change implements a vector-like abstraction for nodes to allow resizing without modifying the addresses of existing nodes. This aims to resolve the thread safety issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

openmp:libomp OpenMP host runtime

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants