Skip to content

Conversation

haiyanghee
Copy link

My attempt to fix #154201.

I've run the omp tests using the check-openmp target, and the new tests pass without segfaults/assertion errors/deadlocks.

However, I'm not certain that my changes are complete (i.e. I didn't miss any edge cases), as I occasionally get 1 or 2 segfaults when I run my changes with my company's regression test that heavily uses omp (I'm uncertain if its my omp changes or company's code base that caused the segfaults). However, I couldn't reproduce any segfaults with the tests I've added.

Also this is my first time contributing to open source, so any feedback is appreciated! :)

Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added the openmp:libomp OpenMP host runtime label Aug 18, 2025
@haiyanghee
Copy link
Author

Hi @jprotze , would you please be able to review this PR? If not can you please tell me who should review this?

Copy link

github-actions bot commented Sep 4, 2025

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff origin/main HEAD --extensions h,c,cpp -- openmp/runtime/src/kmp.h openmp/runtime/src/kmp_csupport.cpp openmp/runtime/src/kmp_global.cpp openmp/runtime/src/kmp_lock.cpp openmp/runtime/src/kmp_lock.h openmp/runtime/src/kmp_runtime.cpp openmp/runtime/src/ompt-internal.h openmp/runtime/src/z_Linux_util.cpp openmp/runtime/test/api/omp_pause_resource.c

⚠️
The reproduction instructions above might return results for more than one PR
in a stack if you are using a stacked PR workflow. You can limit the results by
changing origin/main to the base branch/commit you want to compare against.
⚠️

View the diff from clang-format here.
diff --git a/openmp/runtime/test/api/omp_pause_resource.c b/openmp/runtime/test/api/omp_pause_resource.c
index fce83824d..6154377b9 100644
--- a/openmp/runtime/test/api/omp_pause_resource.c
+++ b/openmp/runtime/test/api/omp_pause_resource.c
@@ -16,9 +16,13 @@ void doOmpWorkWithCritical(int *a_lockCtr, int *b_lockCtr) {
 #pragma omp parallel num_threads(NUM_THREADS)
   {
 #pragma omp critical(a_lock)
-    { *a_lockCtr = *a_lockCtr + 1; }
+    {
+      *a_lockCtr = *a_lockCtr + 1;
+    }
 #pragma omp critical(b_lock)
-    { *b_lockCtr = *b_lockCtr + 1; }
+    {
+      *b_lockCtr = *b_lockCtr + 1;
+    }
   }
 }
 
@@ -47,7 +51,8 @@ void test_omp_get_thread_num_after_omp_hard_pause_resource_all() {
 
 // use omp to do some work, guarantees omp initialization
 #pragma omp parallel num_threads(NUM_THREADS)
-  {}
+  {
+  }
 
   // omp hard pause should succeed
   int rc = omp_pause_resource_all(omp_pause_hard);
@@ -62,7 +67,8 @@ void test_omp_get_thread_num_after_omp_hard_pause_resource_all() {
 void test_omp_parallel_num_threads_after_omp_hard_pause_resource_all() {
 // use omp to do some work
 #pragma omp parallel num_threads(NUM_THREADS)
-  {}
+  {
+  }
 
   // omp hard pause should succeed
   int rc = omp_pause_resource_all(omp_pause_hard);
@@ -70,7 +76,8 @@ void test_omp_parallel_num_threads_after_omp_hard_pause_resource_all() {
 
 // this should not trigger any omp asserts
 #pragma omp parallel num_threads(NUM_THREADS)
-  {}
+  {
+  }
 }
 
 void test_KMP_INIT_AT_FORK_with_fork_after_omp_hard_pause_resource_all() {
@@ -106,7 +113,8 @@ void test_KMP_INIT_AT_FORK_with_fork_after_omp_hard_pause_resource_all() {
 void test_fork_child_exiting_after_omp_hard_pause_resource_all() {
 // use omp to do some work
 #pragma omp parallel num_threads(NUM_THREADS)
-  {}
+  {
+  }
 
   // omp hard pause should succeed
   int rc = omp_pause_resource_all(omp_pause_hard);

Copy link
Collaborator

@jprotze jprotze left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A first round of comments

__kmp_free(ll->lock);
ll->lock = NULL;
// reset the reverse critical section pointer to 0
if (ll->rev_ptr_critSec && LIKELY(!__kmp_in_atexit))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rev_ptr_critSec is not initialized. Set it to nullptr in __kmp_allocate_indirect_lock

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In __kmp_allocate_indirect_lock() the new lock tables are allocated with multiples of sizeof(kmp_indirect_lock_t), and I believe __kmp_allocate() will memset the allocated memory to 0. So I believe rev_ptr_critSec is technically initialized to 0 in this case?

I can also explicitly initialize it to nullptr if that makes the code easier to reason about.

void __kmp_register_atfork(void) {
if (__kmp_need_register_atfork) {
// NOTE: we will not double register our fork handlers! It will cause deadlock
if (!__kmp_already_registered_atfork && __kmp_need_register_atfork) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change related to the pause_resource_all issue?

Copy link
Author

@haiyanghee haiyanghee Sep 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically yes, because if you see the test function test_KMP_INIT_AT_FORK_with_fork_after_omp_hard_pause_resource_all() I added in the unit tests, the original behaviour will actually register the atfork handlers twice (if the environment variable KMP_INIT_AT_FORK is explicitly set to 1), which causes a deadlock (since the atfork handlers are run twice, and inside the handler it will do locking).

There might be a cleaner way to prevent double atfork registration than adding another flag (I thought I can re-use the variable __kmp_need_register_atfork_specified, but I didn't since it looks like its only used for debug printing)

Comment on lines 8348 to 8351
#ifdef KMP_TDATA_GTID
/*reset __kmp_gtid to initial value*/
__kmp_gtid = KMP_GTID_DNE;
#endif
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thread 0 should not drop it's gtid

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do I know we are thread 0? Do I just check if __kmp_gtid == 0 (as __kmp_gtid is declared as thread local storage) and if thats the case, I don't reset it?

I chose to reset __kmp_gtid to KMP_GTID_DNE because it is initialized to KMP_GTID_DNE in kmp_global.cpp, so I thought resetting to its default value was ok.

Can you please point me how this can cause an issue in the code, so that I can understand the code base better?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you look at code generated for an OpenMP program (IR or disassembled objdump), you will see that each function that contains __kmpc* calls initially gets the gtid using __kmpc_global_thread_num, which triggers serial initialization. This gtid is then passed to many __kmpc calls. You already ran into the issue that __kmpc_push_num_threads assumes that serial initialization has happened. I think, that the runtime must be in serial initialized state after omp_pause_resource[_all] to not break the assumptions of compiler+runtime implementation.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right I did realize clang inserts __kmpc_global_thread_num() at the beginning of functions.

But how does not resetting __kmp_gtid to KMP_GTID_DNE for thread 0 imply that the runtime in serial initialized state after omp_pause_resource calls? It sounds like we want to call __kmpc_global_thread_num() (or something equivalent) right after omp_pause_resource to force serial initialized state?

@jprotze
Copy link
Collaborator

jprotze commented Sep 4, 2025

To automatically address the code formatting issues, please run git clang-format a0c2d6e369a1fb4d8b3ed46baed7a2de2fb3d882 (since a0c2d6e is the latest commit from main in your branch at the moment)

…!UNLIKELY(__kmp_in_atexit))` as suggested since they are equivalent
in `__kmpc_push_num_threads()`, as suggested in feedback
@haiyanghee
Copy link
Author

Thank you Joachim for reviewing the PR!

I've replied to some of your comments and implemented some of the feedback, please re-review the comments and changes.

Also I realized I'm probably not following the standard naming convention (for example, the new rev_ptr_critSec variable) that's present in the code base. As I'm not good at naming things, please comment on any code style changes that I should do and I'll happily change them.

Haiyang He added 2 commits September 17, 2025 10:21
…__kmp_hard_pause()`

This should ensure that the program is in serially initialized state after
doing a hard pause, hence not breaking any compiler/runtime
invariants/assumptions.

Also fixed formatting issues
@haiyanghee
Copy link
Author

haiyanghee commented Sep 17, 2025

Hi @jprotze , I attempted to address the issue where after omp_pause_resource_all() is called, the runtime must be in serial initialized state in order to not break existing compiler/runtime assumptions in my new commits.

I can simply call __kmp_serial_initialize() right after __kmp_hard_pause(), but the only issue I worry about is that if we fork right after __kmp_hard_pause() and we have KMP_INIT_AT_FORK=0 set (so we don't run any atfork handlers), the runtime state in the child is unusable. As far as I understand, the omp implementation should make it possible to fork after we hard reset the omp runtime without extra environment variable configurations.

So I want the serial initialization right after a hard pause to have the following properties (in the case that we want to fork immediately after a hard pause):

  • (Property 1) does not spawn new pthreads, or any pthread data structure that cannot be used after forking
  • (Property 2) does not open (and hence share) any unwanted file descriptors with the child after forking

And it looks like we should be already doing this inside the __kmp_atfork_child() atfork handler. So I basically just did the same thing right after __kmp_hard_pause().

I tried to understand what __kmp_do_serial_initialize() does, but I didn't completely understand why every part of it is needed, and I also don't know all of the compiler and runtime assumptions. So I hope that by doing what the child fork handler is doing I don't break any existing invariants.

Here is the reason why I think the serial initialization satisfies the 2 properties I've stated above:

I went in gdb and break pointed all of the pthread functions to see what will be called during __kmp_do_serial_initialize() by doing rbreak ^pthread_.*.

I see only these pthread functions are used in __kmp_do_serial_initialize():

- pthread_once()
    - called in ompt_pre_init()
    - I think this is ok as its just executing things once in ompt (I'm not
familiar with ompt so I hope we don't need to do any clean up up on fork, as
the child fork handler didn't do any)
- pthread_key_create()
    - called in __kmp_runtime_initialize() when we create the thread private keys
    - this is ok, since pthread internally should just malloc some arrays for
thread storage in the future
- pthread_setspecific() and pthread_getspecific()
    - called in `__kmp_register_root(TRUE)` when we call `__kmp_gtid_set_specific()`,
it will write the gtid to the thread private variable
        - NOTE that we didn't call any pthread_create yet
    - this is ok, as the gtid written will always be 0 (line 3863 where we set `gtid=0` in `__kmp_register_root()`). This is because:
        - the input argument `initial_thread` to `__kmp_register_root()` is TRUE
        - and `__kmp_threads[0]` should be 0
        - and we shouldn't have set `__kmp_init_hidden_helper_threads` at that time
            - as it looks like `__kmp_init_hidden_helper_threads` is set to true in `__kmp_task_alloc()`
- pthread_mutex_lock() and pthread_mutex_unlock()
    - called in __kmp_itt_initialize()
    - this is ok, mutex should be reusable after forking

So I believe property 1 holds.

Next, I break pointed the open() system call to catch any file descriptor openings. The only place where I see we open file descriptors is in __kmp_register_library_startup() where we might open a shared memory file. However, since we set __kmp_need_register_serial() to false before we call __kmp_do_serial_initialize(), it will not call __kmp_register_library_startup().

So I believe property 2 also holds.

NOTE: this only verifies that the properties hold for a certain compilation, as other compiler flags might introduce different behaviour.

Please correct me if my understanding is wrong, as I'd love to know and do this properly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

openmp:libomp OpenMP host runtime

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[openmp] Segfaults/assertion errors on certain omp statements after calling omp_pause_resource_all(omp_pause_hard)

3 participants