Skip to content

Conversation

@JackAKirk
Copy link
Contributor

@JackAKirk JackAKirk commented Jun 25, 2024

Add "ptr_to_integer" (generic address space to .shared) syclcompat functions.

These functions are commonly required in optimized libraries that use inline ptx. The standard naming convention of removing "__" from corresponding cuda builtins has been applied. See the readme and accompanying test-e2e for example usage.

These functions are commonly required in optimized libraries that use
inline ptx. The standard naming convention of removing "__" from
corresponding cuda builtins has been applied.

Signed-off-by: JackAKirk <[email protected]>
Signed-off-by: JackAKirk <[email protected]>
@JackAKirk JackAKirk requested a review from a team as a code owner June 25, 2024 14:26
Signed-off-by: JackAKirk <[email protected]>
@joeatodd joeatodd changed the title [syclcompat][cuda] Add "ptr_to_integer" syclcompat functions. [SYCL][COMPAT][cuda] Add "ptr_to_integer" syclcompat functions. Jun 26, 2024
ptx -> PTX
removed ptx doc link as requested.

Co-authored-by: Alberto Cabrera Pérez <[email protected]>
Copy link
Contributor

@joeatodd joeatodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these functions do the same thing aside from casting to int/size_t, can we not implement them as a single templated function?

@JackAKirk
Copy link
Contributor Author

JackAKirk commented Jun 27, 2024

Since these functions do the same thing aside from casting to int/size_t, can we not implement them as a single templated function?

Uncertainty around this is the reason I put them in experimental. It's a bit messy since the cuda versions of these api require different cuda toolkit versions (10.1 for the uint32_t and 11 for size_t, I think), but this does not affect these syclcompat translated versions. I was just told to translate them in this way so that cutlass sycl path can have corresponding apis to cuda runtime path. I don't think I really have the context to make a decision beyond this. It is probably best to ask @aacostadiaz what is best.

@JackAKirk
Copy link
Contributor Author

Since these functions do the same thing aside from casting to int/size_t, can we not implement them as a single templated function?

@aacostadiaz wants them to be two separate functions, so I'll leave it as it is.

Signed-off-by: JackAKirk <[email protected]>
Signed-off-by: JackAKirk <[email protected]>
@JackAKirk
Copy link
Contributor Author

@Alcpz @joeatodd Any more reviews for this?

Thanks

Copy link
Contributor

@joeatodd joeatodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, these should be a single function with a template parameter describing the return type.

@npmiller
Copy link
Contributor

Closing this after further discussions offline

@npmiller npmiller closed this Aug 29, 2024
Merge branch 'sycl' into cuda-nvvm_get_smem_pointer

Signed-off-by: JackAKirk <[email protected]>
A single templated function is preferred.

Signed-off-by: JackAKirk <[email protected]>
Copy link
Contributor

@joeatodd joeatodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this @JackAKirk. Just a couple of formatting requests. Cheers!

Comment on lines 971 to 984
``` c++
half *data = syclcompat::local_mem<half[NUM_ELEMENTS]>();
// ...
// ...
T addr =
syclcompat::ptr_to_int<T>(reinterpret_cast<char *>(data) + (id % 8) * 16);
uint32_t fragment;
#if defined(__NVPTX__)
asm volatile("ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%0}, [%1];\n"
: "=r"(fragment)
: "r"(addr));
#endif
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you fix the formatting of this code section? Thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did clang-format it already using dpc++ format.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly it's not running on code sections in markdown? I'd expect uint32_t fragment to align with T addr on the line above? And the line split on lines 975-976 looks pretty wacky? If I dump this code into a cpp file and autoformat this, I get:

  half *data = syclcompat::local_mem<half[NUM_ELEMENTS]>();
  // ...
  // ...
  T addr =
      syclcompat::ptr_to_int<T>(reinterpret_cast<char *>(data) + (id % 8) * 16);
  uint32_t fragment;
#if defined(__NVPTX__)
  asm volatile("ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%0}, [%1];\n"
               : "=r"(fragment)
               : "r"(addr));
#endif

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see if that passes clang-format (in the test where it is used). The existing version passes the clang-format on the clang-format CI.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think clang-format runs on the README tbh.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I use the same code in the test-e2e

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly it's not running on code sections in markdown? I'd expect uint32_t fragment to align with T addr on the line above? And the line split on lines 975-976 looks pretty wacky? If I dump this code into a cpp file and autoformat this, I get:

  half *data = syclcompat::local_mem<half[NUM_ELEMENTS]>();
  // ...
  // ...
  T addr =
      syclcompat::ptr_to_int<T>(reinterpret_cast<char *>(data) + (id % 8) * 16);
  uint32_t fragment;
#if defined(__NVPTX__)
  asm volatile("ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%0}, [%1];\n"
               : "=r"(fragment)
               : "r"(addr));
#endif

I've updated the README with this suggestion now

Signed-off-by: JackAKirk <[email protected]>
@JackAKirk
Copy link
Contributor Author

@Alcpz is this OK now?
Thanks

@Alcpz
Copy link
Contributor

Alcpz commented Oct 9, 2024

@Alcpz is this OK now? Thanks

Yes. I agree with @joeatodd review.
Accepting your changes, assuming that you will finalize addressing his suggestions. Sorry for missing this.

@JackAKirk
Copy link
Contributor Author

@Alcpz is this OK now? Thanks

Yes. I agree with @joeatodd review. Accepting your changes, assuming that you will finalize addressing his suggestions. Sorry for missing this.

Yes, I've updated the formatting now, thanks.

@JackAKirk JackAKirk requested a review from joeatodd October 9, 2024 14:52
Copy link
Contributor

@joeatodd joeatodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@JackAKirk
Copy link
Contributor Author

@intel/llvm-gatekeepers Please merge this.

Thanks

@martygrant martygrant merged commit 3ba29f3 into intel:sycl Oct 10, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants