Skip to content

Conversation

@Ritanya-B-Bharadwaj
Copy link
Contributor

Initial runtime support for threadset clause in task and taskloop directives [Section 14.8 in in OpenMP 6.0 spec]

Frontend PR- #135807

@github-actions
Copy link

github-actions bot commented Jun 16, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put free_agent_eligible before reserved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you need to use reversed order for big endian here

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#if OMPX_TASKGRAPH
  unsigned reserved31 : 4;
  unsigned onced : 1;
#else
  unsigned reserved31 : 5;
#endif
  unsigned hidden_helper : 1;
  unsigned target : 1;
  unsigned native : 1;
  unsigned freed : 1;
  unsigned complete : 1;
  unsigned executing : 1;
  unsigned started : 1;
  unsigned team_serial : 1;
  unsigned tasking_ser : 1;
  unsigned task_serial : 1;
  unsigned tasktype : 1;
  unsigned reserved : 8;
  unsigned free_agent_eligible : 1;
  unsigned detachable : 1;
  unsigned priority_specified : 1;
  unsigned proxy : 1;
  unsigned destructors_thunk : 1;
  unsigned merged_if0 : 1;
  unsigned final : 1;
  unsigned tiedness : 1;

Comment on lines 2767 to 2768
unsigned reserved : 7; /* reserved for compiler use */
unsigned free_agent_eligible : 1; /* set if task can be executed by a

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be reversed as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it be the reverse of the above?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you take one bit out of a reserved bits, you need to make it before the remaining bits.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you take one bit out of a reserved bits, you need to make it before the remaining bits.

If you look at the other bits, they are consumed in reverse order for this endianess. I'm still confused by inverting the bit order and not the byte order for endianess, but this is how they seem to be implemented.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, that reminds me of the big endian thingy that IBM folks fixed in the past.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/* Compiler flags */ /* Total compiler flags must be 16 bits */
  unsigned tiedness : 1; /* 0: task is either tied (1) or untied (0) */
  unsigned final : 1; /* 1: task is final(1) so execute immediately */
  unsigned merged_if0 : 1; /* 2: no __kmpc_task_{begin/complete}_if0 calls in if0
                              code path */
  unsigned destructors_thunk : 1; /* 3: set if the compiler creates a thunk to
                                     invoke destructors from the runtime */
  unsigned proxy : 1; /* 4:task is a proxy task (it will be executed outside the
                         context of the RTL) */
  unsigned priority_specified : 1; /* 5: set if the compiler provides priority
                                      setting for the task */
  unsigned detachable : 1; /* 6: 1 == can detach */
  unsigned free_agent_eligible : 1; /* 7: set if task can be executed by a
                                     free-agent thread */
  unsigned reserved : 8; /* reserved for compiler use */
  /* Library flags */ /* Total library flags must be 16 bits */
  unsigned tasktype : 1; /* task is either explicit(1) or implicit (0) */
  unsigned task_serial : 1; // task is executed immediately (1) or deferred (0)
  unsigned tasking_ser : 1; // all tasks in team are either executed immediately
  // (1) or may be deferred (0)
  unsigned team_serial : 1; // entire team is serial (1) [1 thread] or parallel
  // (0) [>= 2 threads]
  /* If either team_serial or tasking_ser is set, task team may be NULL */
  /* Task State Flags: */
  unsigned started : 1; /* 1==started, 0==not started     */
  unsigned executing : 1; /* 1==executing, 0==not executing */
  unsigned complete : 1; /* 1==complete, 0==not complete   */
  unsigned freed : 1; /* 1==freed, 0==allocated        */
  unsigned native : 1; /* 1==gcc-compiled task, 0==intel */
  unsigned target : 1;
  unsigned hidden_helper : 1; /* 1 == hidden helper task */
#if OMPX_TASKGRAPH
  unsigned onced : 1; /* 1==ran once already, 0==never ran, record & replay purposes */
  unsigned reserved31 : 4; /* reserved for library use */
#else
  unsigned reserved31 : 5; /* reserved for library use */
#endif```

@shiltian shiltian changed the title [clang][OpenMP] OpenMP ThreadSet clause - basic runtime [OpenMP] OpenMP ThreadSet clause - basic runtime Jun 17, 2025
Copy link
Contributor

@shiltian shiltian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any meaningful changes out of this PR at this moment. This should definitely not be merged into the repo given its current shape.

@jprotze
Copy link
Collaborator

jprotze commented Jun 17, 2025

I don't see any meaningful changes out of this PR at this moment. This should definitely not be merged into the repo given its current shape.

This is a compliant stub implementation of the runtime to complement the codegen. There are other openmp features with a runtime stub implementation at a similar QoI level on runtime side (taskwait nowait, target nowait with depend, ...)

@shiltian
Copy link
Contributor

I'd assume the runtime support is orthogonal to the front end support. For the runtime support, this is far from what is needed. Just to add one bit w/o anything else is not sufficient. Even if the front end emits code that uses the bit, it will be fine because it is reserved anyway. On the other hand, to add runtime support doesn't require front end support as well.

@alexey-bataev
Copy link
Member

I'd assume the runtime support is orthogonal to the front end support. For the runtime support, this is far from what is needed. Just to add one bit w/o anything else is not sufficient. Even if the front end emits code that uses the bit, it will be fine because it is reserved anyway. On the other hand, to add runtime support doesn't require front end support as well.

Not quite. If the compiler claims some bits used for some particular purposes, we cannot keep this bit "reserved" in the runtime, it maybe a potential source of bugs in future

@shiltian
Copy link
Contributor

I'm not saying we are gonna have that for a long run. Just speaking from the series of patch's perspective. Even if we can't, we can always merge runtime support first, and then do the front end work "later".

@jprotze
Copy link
Collaborator

jprotze commented Jun 17, 2025

The whole bitset got messed up. It used to be 16 bits for compiler use, 16 bits for runtime use:

/* Compiler flags */ /* Total compiler flags must be 16 bits */
  unsigned tiedness : 1; /* task is either tied (1) or untied (0) */
  unsigned final : 1; /* task is final(1) so execute immediately */
  unsigned merged_if0 : 1; /* no __kmpc_task_{begin/complete}_if0 calls in if0
                              code path */
  unsigned destructors_thunk : 1; /* set if the compiler creates a thunk to
                                     invoke destructors from the runtime */
  unsigned proxy : 1; /* task is a proxy task (it will be executed outside the
                         context of the RTL) */
  unsigned priority_specified : 1; /* set if the compiler provides priority
                                      setting for the task */
  unsigned detachable : 1; /* 1 == can detach */
  unsigned reserved : 9; /* reserved for compiler use */

Then someone added hidden_helper to the compiler flags. Is this bit ever set by the compiler, or should it be a runtime flag?

The new flag should definitely be added to the compiler flags partition. From my perspective it should replace the hidden_helper flag and the hidden_helper flag should move to the runtime partition.

Copy link
Collaborator

@jprotze jprotze left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reordering the bits, the PR lgtm.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#if OMPX_TASKGRAPH
  unsigned reserved31 : 4;
  unsigned onced : 1;
#else
  unsigned reserved31 : 5;
#endif
  unsigned hidden_helper : 1;
  unsigned target : 1;
  unsigned native : 1;
  unsigned freed : 1;
  unsigned complete : 1;
  unsigned executing : 1;
  unsigned started : 1;
  unsigned team_serial : 1;
  unsigned tasking_ser : 1;
  unsigned task_serial : 1;
  unsigned tasktype : 1;
  unsigned reserved : 8;
  unsigned free_agent_eligible : 1;
  unsigned detachable : 1;
  unsigned priority_specified : 1;
  unsigned proxy : 1;
  unsigned destructors_thunk : 1;
  unsigned merged_if0 : 1;
  unsigned final : 1;
  unsigned tiedness : 1;

Comment on lines 2767 to 2768
unsigned reserved : 7; /* reserved for compiler use */
unsigned free_agent_eligible : 1; /* set if task can be executed by a
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/* Compiler flags */ /* Total compiler flags must be 16 bits */
  unsigned tiedness : 1; /* 0: task is either tied (1) or untied (0) */
  unsigned final : 1; /* 1: task is final(1) so execute immediately */
  unsigned merged_if0 : 1; /* 2: no __kmpc_task_{begin/complete}_if0 calls in if0
                              code path */
  unsigned destructors_thunk : 1; /* 3: set if the compiler creates a thunk to
                                     invoke destructors from the runtime */
  unsigned proxy : 1; /* 4:task is a proxy task (it will be executed outside the
                         context of the RTL) */
  unsigned priority_specified : 1; /* 5: set if the compiler provides priority
                                      setting for the task */
  unsigned detachable : 1; /* 6: 1 == can detach */
  unsigned free_agent_eligible : 1; /* 7: set if task can be executed by a
                                     free-agent thread */
  unsigned reserved : 8; /* reserved for compiler use */
  /* Library flags */ /* Total library flags must be 16 bits */
  unsigned tasktype : 1; /* task is either explicit(1) or implicit (0) */
  unsigned task_serial : 1; // task is executed immediately (1) or deferred (0)
  unsigned tasking_ser : 1; // all tasks in team are either executed immediately
  // (1) or may be deferred (0)
  unsigned team_serial : 1; // entire team is serial (1) [1 thread] or parallel
  // (0) [>= 2 threads]
  /* If either team_serial or tasking_ser is set, task team may be NULL */
  /* Task State Flags: */
  unsigned started : 1; /* 1==started, 0==not started     */
  unsigned executing : 1; /* 1==executing, 0==not executing */
  unsigned complete : 1; /* 1==complete, 0==not complete   */
  unsigned freed : 1; /* 1==freed, 0==allocated        */
  unsigned native : 1; /* 1==gcc-compiled task, 0==intel */
  unsigned target : 1;
  unsigned hidden_helper : 1; /* 1 == hidden helper task */
#if OMPX_TASKGRAPH
  unsigned onced : 1; /* 1==ran once already, 0==never ran, record & replay purposes */
  unsigned reserved31 : 4; /* reserved for library use */
#else
  unsigned reserved31 : 5; /* reserved for library use */
#endif```

@Ritanya-B-Bharadwaj
Copy link
Contributor Author

@jprotze I have modified based on your suggestions.

Copy link
Collaborator

@jprotze jprotze left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm

Copy link

@jpeyton52 jpeyton52 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jprotze jprotze dismissed shiltian’s stale review July 8, 2025 12:16

We need the stub implementation for consistency between compiler and runtime bits

@jprotze jprotze merged commit 0f391d6 into llvm:main Jul 8, 2025
7 checks passed
@Ritanya-B-Bharadwaj
Copy link
Contributor Author

@jprotze the frontend PR for this feature is still open - #135807
Can you please review it?

@shiltian
Copy link
Contributor

shiltian commented Jul 8, 2025

We need the stub implementation for consistency between compiler and runtime bits

Just for the record, correct me if I'm wrong, but I don't think this is necessary for consistency between the compiler and runtime, because the bit isn't set by the compiler. I'm not comfortable merging a non-NFC change without any test case. On the other hand, even if it is set by the front end, the runtime changes can always go before the compiler change with a proper test case. The test case itself can simply manually construct the necessary function call that the compiler is expected to emit.

@jprotze
Copy link
Collaborator

jprotze commented Jul 8, 2025

We need the stub implementation for consistency between compiler and runtime bits

Just for the record, correct me if I'm wrong, but I don't think this is necessary for consistency between the compiler and runtime, because the bit isn't set by the compiler.

You are wrong. Or, what do you mean with "because the bit isn't set by the compiler"? Sure, the compiler doesn't set the bit literally. The bit is set in compiler-generated code injected into the application code (https://github.com/llvm/llvm-project/pull/135807/files#diff-ab54595e29e58ad74c94b0ecaf4d2f32d626345db3ae734aa7cf7d5e404ed499R3716). The compiler-generated code sets the bit, if threadset(omp_pool) is present on the task construct. The bit will be visible in the flags argument of __kmpc_omp_task_alloc, which is a call that only compiler-generated code should call (thus _kmpc prefix).

I'm happy to add an OMPT-test as soon as the frontend patch has landed.

@shiltian
Copy link
Contributor

shiltian commented Jul 8, 2025

Then check this part of my previous comment:

even if it is set by the front end, the runtime changes can always go before the compiler change with a proper test case. The test case itself can simply manually construct the necessary function call that the compiler is expected to emit.

We already have a bunch of tests in libomp that "emulates" code generated by the compiler.

@TerryLWilmarth
Copy link
Contributor

Then check this part of my previous comment:

even if it is set by the front end, the runtime changes can always go before the compiler change with a proper test case. The test case itself can simply manually construct the necessary function call that the compiler is expected to emit.

We already have a bunch of tests in libomp that "emulates" code generated by the compiler.

That's right -- you could write a test that calls __kmp_omp_task_alloc directly with the new bit set, as if it were done by the compiler. It will be a bit boring right now, since __kmp_omp_task_alloc doesn't do anything with it, but sufficient to actually do that inside a parallel region and make sure that it at least runs the task and doesn't break anything.

That's the only interaction between the compiler and runtime that we see happening in our implementation right now. We're working on the execution model changes and free agent threads now, but likely you will see the execution model changes first (if I can successfully keep it separate in any reasonable way). I'm all for having the front end in place earlier than the runtime in this case, because writing tests that emulate the compiler codegen to thoroughly test such things is tedious. That said, if the front end isn't ready, we can provide a more thorough test with the free agent threads contribution.

@jprotze
Copy link
Collaborator

jprotze commented Jul 8, 2025

We already have a bunch of tests in libomp that "emulates" code generated by the compiler.

These tests emulate code generation, when CodeGen lacks behind while a new feature is implemented in the runtime and should be tested for functionality. Here we have the opposite situation: runtime implementation lacks behind CodeGen. We still want to have the usage of the bit documented on both sides of the interface.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

openmp:libomp OpenMP host runtime

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants