-
Notifications
You must be signed in to change notification settings - Fork 15k
[OpenMP] OpenMP ThreadSet clause - basic runtime #144409
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
54f3860 to
ca3f29a
Compare
openmp/runtime/src/kmp.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put free_agent_eligible before reserved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like you need to use reversed order for big endian here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#if OMPX_TASKGRAPH
unsigned reserved31 : 4;
unsigned onced : 1;
#else
unsigned reserved31 : 5;
#endif
unsigned hidden_helper : 1;
unsigned target : 1;
unsigned native : 1;
unsigned freed : 1;
unsigned complete : 1;
unsigned executing : 1;
unsigned started : 1;
unsigned team_serial : 1;
unsigned tasking_ser : 1;
unsigned task_serial : 1;
unsigned tasktype : 1;
unsigned reserved : 8;
unsigned free_agent_eligible : 1;
unsigned detachable : 1;
unsigned priority_specified : 1;
unsigned proxy : 1;
unsigned destructors_thunk : 1;
unsigned merged_if0 : 1;
unsigned final : 1;
unsigned tiedness : 1;
ca3f29a to
4553bd2
Compare
openmp/runtime/src/kmp.h
Outdated
| unsigned reserved : 7; /* reserved for compiler use */ | ||
| unsigned free_agent_eligible : 1; /* set if task can be executed by a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be reversed as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't it be the reverse of the above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you take one bit out of a reserved bits, you need to make it before the remaining bits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you take one bit out of a reserved bits, you need to make it before the remaining bits.
If you look at the other bits, they are consumed in reverse order for this endianess. I'm still confused by inverting the bit order and not the byte order for endianess, but this is how they seem to be implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, that reminds me of the big endian thingy that IBM folks fixed in the past.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/* Compiler flags */ /* Total compiler flags must be 16 bits */
unsigned tiedness : 1; /* 0: task is either tied (1) or untied (0) */
unsigned final : 1; /* 1: task is final(1) so execute immediately */
unsigned merged_if0 : 1; /* 2: no __kmpc_task_{begin/complete}_if0 calls in if0
code path */
unsigned destructors_thunk : 1; /* 3: set if the compiler creates a thunk to
invoke destructors from the runtime */
unsigned proxy : 1; /* 4:task is a proxy task (it will be executed outside the
context of the RTL) */
unsigned priority_specified : 1; /* 5: set if the compiler provides priority
setting for the task */
unsigned detachable : 1; /* 6: 1 == can detach */
unsigned free_agent_eligible : 1; /* 7: set if task can be executed by a
free-agent thread */
unsigned reserved : 8; /* reserved for compiler use */
/* Library flags */ /* Total library flags must be 16 bits */
unsigned tasktype : 1; /* task is either explicit(1) or implicit (0) */
unsigned task_serial : 1; // task is executed immediately (1) or deferred (0)
unsigned tasking_ser : 1; // all tasks in team are either executed immediately
// (1) or may be deferred (0)
unsigned team_serial : 1; // entire team is serial (1) [1 thread] or parallel
// (0) [>= 2 threads]
/* If either team_serial or tasking_ser is set, task team may be NULL */
/* Task State Flags: */
unsigned started : 1; /* 1==started, 0==not started */
unsigned executing : 1; /* 1==executing, 0==not executing */
unsigned complete : 1; /* 1==complete, 0==not complete */
unsigned freed : 1; /* 1==freed, 0==allocated */
unsigned native : 1; /* 1==gcc-compiled task, 0==intel */
unsigned target : 1;
unsigned hidden_helper : 1; /* 1 == hidden helper task */
#if OMPX_TASKGRAPH
unsigned onced : 1; /* 1==ran once already, 0==never ran, record & replay purposes */
unsigned reserved31 : 4; /* reserved for library use */
#else
unsigned reserved31 : 5; /* reserved for library use */
#endif```
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any meaningful changes out of this PR at this moment. This should definitely not be merged into the repo given its current shape.
This is a compliant stub implementation of the runtime to complement the codegen. There are other openmp features with a runtime stub implementation at a similar QoI level on runtime side (taskwait nowait, target nowait with depend, ...) |
|
I'd assume the runtime support is orthogonal to the front end support. For the runtime support, this is far from what is needed. Just to add one bit w/o anything else is not sufficient. Even if the front end emits code that uses the bit, it will be fine because it is reserved anyway. On the other hand, to add runtime support doesn't require front end support as well. |
Not quite. If the compiler claims some bits used for some particular purposes, we cannot keep this bit "reserved" in the runtime, it maybe a potential source of bugs in future |
|
I'm not saying we are gonna have that for a long run. Just speaking from the series of patch's perspective. Even if we can't, we can always merge runtime support first, and then do the front end work "later". |
|
The whole bitset got messed up. It used to be 16 bits for compiler use, 16 bits for runtime use: Then someone added The new flag should definitely be added to the compiler flags partition. From my perspective it should replace the hidden_helper flag and the hidden_helper flag should move to the runtime partition. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After reordering the bits, the PR lgtm.
openmp/runtime/src/kmp.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#if OMPX_TASKGRAPH
unsigned reserved31 : 4;
unsigned onced : 1;
#else
unsigned reserved31 : 5;
#endif
unsigned hidden_helper : 1;
unsigned target : 1;
unsigned native : 1;
unsigned freed : 1;
unsigned complete : 1;
unsigned executing : 1;
unsigned started : 1;
unsigned team_serial : 1;
unsigned tasking_ser : 1;
unsigned task_serial : 1;
unsigned tasktype : 1;
unsigned reserved : 8;
unsigned free_agent_eligible : 1;
unsigned detachable : 1;
unsigned priority_specified : 1;
unsigned proxy : 1;
unsigned destructors_thunk : 1;
unsigned merged_if0 : 1;
unsigned final : 1;
unsigned tiedness : 1;
openmp/runtime/src/kmp.h
Outdated
| unsigned reserved : 7; /* reserved for compiler use */ | ||
| unsigned free_agent_eligible : 1; /* set if task can be executed by a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/* Compiler flags */ /* Total compiler flags must be 16 bits */
unsigned tiedness : 1; /* 0: task is either tied (1) or untied (0) */
unsigned final : 1; /* 1: task is final(1) so execute immediately */
unsigned merged_if0 : 1; /* 2: no __kmpc_task_{begin/complete}_if0 calls in if0
code path */
unsigned destructors_thunk : 1; /* 3: set if the compiler creates a thunk to
invoke destructors from the runtime */
unsigned proxy : 1; /* 4:task is a proxy task (it will be executed outside the
context of the RTL) */
unsigned priority_specified : 1; /* 5: set if the compiler provides priority
setting for the task */
unsigned detachable : 1; /* 6: 1 == can detach */
unsigned free_agent_eligible : 1; /* 7: set if task can be executed by a
free-agent thread */
unsigned reserved : 8; /* reserved for compiler use */
/* Library flags */ /* Total library flags must be 16 bits */
unsigned tasktype : 1; /* task is either explicit(1) or implicit (0) */
unsigned task_serial : 1; // task is executed immediately (1) or deferred (0)
unsigned tasking_ser : 1; // all tasks in team are either executed immediately
// (1) or may be deferred (0)
unsigned team_serial : 1; // entire team is serial (1) [1 thread] or parallel
// (0) [>= 2 threads]
/* If either team_serial or tasking_ser is set, task team may be NULL */
/* Task State Flags: */
unsigned started : 1; /* 1==started, 0==not started */
unsigned executing : 1; /* 1==executing, 0==not executing */
unsigned complete : 1; /* 1==complete, 0==not complete */
unsigned freed : 1; /* 1==freed, 0==allocated */
unsigned native : 1; /* 1==gcc-compiled task, 0==intel */
unsigned target : 1;
unsigned hidden_helper : 1; /* 1 == hidden helper task */
#if OMPX_TASKGRAPH
unsigned onced : 1; /* 1==ran once already, 0==never ran, record & replay purposes */
unsigned reserved31 : 4; /* reserved for library use */
#else
unsigned reserved31 : 5; /* reserved for library use */
#endif```
|
@jprotze I have modified based on your suggestions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
We need the stub implementation for consistency between compiler and runtime bits
Just for the record, correct me if I'm wrong, but I don't think this is necessary for consistency between the compiler and runtime, because the bit isn't set by the compiler. I'm not comfortable merging a non-NFC change without any test case. On the other hand, even if it is set by the front end, the runtime changes can always go before the compiler change with a proper test case. The test case itself can simply manually construct the necessary function call that the compiler is expected to emit. |
You are wrong. Or, what do you mean with "because the bit isn't set by the compiler"? Sure, the compiler doesn't set the bit literally. The bit is set in compiler-generated code injected into the application code (https://github.com/llvm/llvm-project/pull/135807/files#diff-ab54595e29e58ad74c94b0ecaf4d2f32d626345db3ae734aa7cf7d5e404ed499R3716). The compiler-generated code sets the bit, if I'm happy to add an OMPT-test as soon as the frontend patch has landed. |
|
Then check this part of my previous comment:
We already have a bunch of tests in |
That's right -- you could write a test that calls __kmp_omp_task_alloc directly with the new bit set, as if it were done by the compiler. It will be a bit boring right now, since __kmp_omp_task_alloc doesn't do anything with it, but sufficient to actually do that inside a parallel region and make sure that it at least runs the task and doesn't break anything. That's the only interaction between the compiler and runtime that we see happening in our implementation right now. We're working on the execution model changes and free agent threads now, but likely you will see the execution model changes first (if I can successfully keep it separate in any reasonable way). I'm all for having the front end in place earlier than the runtime in this case, because writing tests that emulate the compiler codegen to thoroughly test such things is tedious. That said, if the front end isn't ready, we can provide a more thorough test with the free agent threads contribution. |
These tests emulate code generation, when CodeGen lacks behind while a new feature is implemented in the runtime and should be tested for functionality. Here we have the opposite situation: runtime implementation lacks behind CodeGen. We still want to have the usage of the bit documented on both sides of the interface. |
Initial runtime support for threadset clause in task and taskloop directives [Section 14.8 in in OpenMP 6.0 spec]
Frontend PR- #135807