-
Notifications
You must be signed in to change notification settings - Fork 795
[SYCL][COMPAT][cuda] Add "ptr_to_integer" syclcompat functions. #14283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
These functions are commonly required in optimized libraries that use inline ptx. The standard naming convention of removing "__" from corresponding cuda builtins has been applied. Signed-off-by: JackAKirk <[email protected]>
Signed-off-by: JackAKirk <[email protected]>
Signed-off-by: JackAKirk <[email protected]>
sycl/test-e2e/syclcompat/memory/local_memory_ptr_to_integer.cpp
Outdated
Show resolved
Hide resolved
ptx -> PTX removed ptx doc link as requested. Co-authored-by: Alberto Cabrera Pérez <[email protected]>
joeatodd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since these functions do the same thing aside from casting to int/size_t, can we not implement them as a single templated function?
Uncertainty around this is the reason I put them in experimental. It's a bit messy since the cuda versions of these api require different cuda toolkit versions (10.1 for the uint32_t and 11 for size_t, I think), but this does not affect these syclcompat translated versions. I was just told to translate them in this way so that cutlass sycl path can have corresponding apis to cuda runtime path. I don't think I really have the context to make a decision beyond this. It is probably best to ask @aacostadiaz what is best. |
Co-authored-by: Joe Todd <[email protected]>
@aacostadiaz wants them to be two separate functions, so I'll leave it as it is. |
Signed-off-by: JackAKirk <[email protected]>
Signed-off-by: JackAKirk <[email protected]>
Signed-off-by: JackAKirk <[email protected]>
joeatodd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed offline, these should be a single function with a template parameter describing the return type.
|
Closing this after further discussions offline |
Merge branch 'sycl' into cuda-nvvm_get_smem_pointer Signed-off-by: JackAKirk <[email protected]>
A single templated function is preferred. Signed-off-by: JackAKirk <[email protected]>
Signed-off-by: JackAKirk <[email protected]>
joeatodd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this @JackAKirk. Just a couple of formatting requests. Cheers!
sycl/doc/syclcompat/README.md
Outdated
| ``` c++ | ||
| half *data = syclcompat::local_mem<half[NUM_ELEMENTS]>(); | ||
| // ... | ||
| // ... | ||
| T addr = | ||
| syclcompat::ptr_to_int<T>(reinterpret_cast<char *>(data) + (id % 8) * 16); | ||
| uint32_t fragment; | ||
| #if defined(__NVPTX__) | ||
| asm volatile("ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%0}, [%1];\n" | ||
| : "=r"(fragment) | ||
| : "r"(addr)); | ||
| #endif | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you fix the formatting of this code section? Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did clang-format it already using dpc++ format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly it's not running on code sections in markdown? I'd expect uint32_t fragment to align with T addr on the line above? And the line split on lines 975-976 looks pretty wacky? If I dump this code into a cpp file and autoformat this, I get:
half *data = syclcompat::local_mem<half[NUM_ELEMENTS]>();
// ...
// ...
T addr =
syclcompat::ptr_to_int<T>(reinterpret_cast<char *>(data) + (id % 8) * 16);
uint32_t fragment;
#if defined(__NVPTX__)
asm volatile("ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%0}, [%1];\n"
: "=r"(fragment)
: "r"(addr));
#endifThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see if that passes clang-format (in the test where it is used). The existing version passes the clang-format on the clang-format CI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think clang-format runs on the README tbh.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I use the same code in the test-e2e
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly it's not running on code sections in markdown? I'd expect
uint32_t fragmentto align withT addron the line above? And the line split on lines 975-976 looks pretty wacky? If I dump this code into a cpp file and autoformat this, I get:half *data = syclcompat::local_mem<half[NUM_ELEMENTS]>(); // ... // ... T addr = syclcompat::ptr_to_int<T>(reinterpret_cast<char *>(data) + (id % 8) * 16); uint32_t fragment; #if defined(__NVPTX__) asm volatile("ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%0}, [%1];\n" : "=r"(fragment) : "r"(addr)); #endif
I've updated the README with this suggestion now
Signed-off-by: JackAKirk <[email protected]>
Signed-off-by: JackAKirk <[email protected]>
|
@Alcpz is this OK now? |
joeatodd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
@intel/llvm-gatekeepers Please merge this. Thanks |
Add "ptr_to_integer" (generic address space to .shared) syclcompat functions.
These functions are commonly required in optimized libraries that use inline ptx. The standard naming convention of removing "__" from corresponding cuda builtins has been applied. See the readme and accompanying test-e2e for example usage.