-
Notifications
You must be signed in to change notification settings - Fork 15.3k
[CUDA][HIP] Fix host/device context in concept #67721
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
yxsamliu
wants to merge
1
commit into
llvm:main
Choose a base branch
from
yxsamliu:concept
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,23 @@ | ||
| // RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -fcuda-is-device -x hip %s \ | ||
| // RUN: -std=c++20 -fsyntax-only -verify | ||
| // RUN: %clang_cc1 -triple x86_64 -x hip %s \ | ||
| // RUN: -std=c++20 -fsyntax-only -verify | ||
|
|
||
| // expected-no-diagnostics | ||
|
|
||
| #include "Inputs/cuda.h" | ||
|
|
||
| template <class T> | ||
| concept C = requires(T x) { | ||
| func(x); | ||
| }; | ||
|
|
||
| struct A {}; | ||
| void func(A x) {} | ||
|
|
||
| template <C T> __global__ void kernel(T x) { } | ||
|
|
||
| int main() { | ||
| A a; | ||
| kernel<<<1,1>>>(a); | ||
| } |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would one write a correct GPU-side requirement?
E.g. I want some
__device__function to be callable.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
currently, we do not support that.
I would suggest adding an extension to the clang that allows
__host__and__device__attributes on call expressions in concept definition to indicate the required callability for the host or device.For example,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, normally a template is usable from either host or device (depending on whether it ends up calling any host-only or device-only function). This choice for concepts seems like it's going to be problematic for that model. Something as simple as:
... should really be callable on the host or device side if
Tis copyable on the host or device side, and using the host side in all cases will mean that things like the C++<complex>or<functional>header may stop doing the right thing in some cases if/when they get extended to use concepts. And it seems like with this patch there's not anything that the authors of those headers can really do about it.Perhaps it would be better for the host/device choice in a concept satisfaction check to depend on the context in which the concept is required to be satisfied (which I would imagine is what happened by chance before this patch), and for us to include the CUDA context as part of the constraint satisfaction cache key? That kind of direction seems like it'd give closer results to what we'd get from the split compilation model. I don't know if that actually works in general, though. For example, given:
... where
Xis a constrained template, what seems like it should happen here is that we take the__host__/__device__into account when concept-checkingX's template arguments, but I'd worry that we don't have the host/device information to hand when checking the concept satisfaction query forX.More broadly, I think there'll be cases where a CUDA developer will want, from host code, to check whether a constraint would be satisfied on the device, and some mechanism for doing that seems useful. I think that can be done with the model I suggest above, by putting a kernel call inside a
requiresexpression, but it seems awkward, so perhaps some syntax for explicitly evaluating a concept-id in a particular host/device would be useful.But it definitely seems worthwhile to figure out what rule NVCC is using here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. template function in CUDA/HIP is subject to the same availability check regarding host/device attributes. e.g https://godbolt.org/z/ccTxhEhfo
constexpr template functions can be called by both host and device functions not because they are template functions, but because host device attributes are added to them implicitly. They are still subject to host/device availability checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a typical use case of concept in CUDA programs, please see https://godbolt.org/z/o7Wa68n9c
This is taken from issue #67507.
In this example, users want to express two constraints on geometric_shape:
it can be passed to a function draw
it can be passed to a function area and the result is convertible to double
For the first constraint, users only need it on the host side. For the second constraint, users need it on both the host side and the device side. This gives us some insight into users' needs for constraints: they are usually different for host and device sides, since users may want to do different things on host and device sides. Therefore, assuming a constraint in a concept should be satisfied on both the device and host sides will result in some unnecessary extra constraints on either side.
Is it OK to evaluate the constraints by the context where the template is instantiated? For example, when we instantiate the kernel
template <geometric_shape T> __global__ void compute_areas, can we evaluate the constraints in the device context to get what we need? It is not good. Because then the constraint about function draw needs to be satisfied on the device side. That is not what we need. The point is, that the constraints defined in a concept need to have individual required context. We want to be able to express that this constraint should be satisfied in the device context, and that constraint should be satisfied in the host context. That is why I propose to allow__device__and__host__attributes to be added to the call expressions in concepts to indicate the required context for an individual constraint.Now that we have discussed the users' needs regarding device/host contexts of constraints. Let's look at how nvcc currently evaluates satisfaction of constraints.
Based on https://godbolt.org/z/o7Wa68n9c , the instantiation of
work<triangle>is successful. We can infer thattrianglesatisfies the two constraints. They can only be evaluated in the host context since functionsdrawandareaare all host functions. Even though the instantiation ofwork<triangle>is done in a device context, the evaluation of the constraints is still done in the host context.The current patch matches nvcc's behaviour.