Skip to content

Conversation

rolandschulz
Copy link

Prototype to test out a solution for intel/llvm#17832 . Looking for review on the syntax. Is it OK to have to wrap the function in a lambda to gain the type deduction? Should the lambda be past as first or last argument?

@t4c1
Copy link

t4c1 commented Apr 14, 2025

Looks cleaner to me, but I would not define stuff in syclcompat namespace within cutlass

@rolandschulz
Copy link
Author

I agree in general we shouldn't put things into syclcompat namespace.

For this PR I put it in syclcompat, because the purpose of this PR is to protype what we want to ask syclcompat to add (intel/llvm#17832).

Assuming it gets added to syclcompat and we switch our code to use it, we would require the latest version of syclcompat. That doesn't work because we want to support the latest released DPC++ version. Would it be reasonable to have in our headers the extra syclcompat code for older versions of DPC++ to avoid this dependency?

Comment on lines +42 to +60
namespace syclcompat {
template <class F, int Dim>
sycl::event launch(const sycl::nd_range<Dim> &range, sycl::queue q, const F& f) {
return q.parallel_for(detail::transform_nd_range<Dim>(range), [=](sycl::nd_item<Dim>) { f(); });
}
template <class F, int Dim>
sycl::event launch(const sycl::nd_range<Dim> &range, const F& f) {
return launch(range, get_default_queue(), f);
}
// Alternative launch through dim3 objects
template <class F>
sycl::event launch(const dim3 &grid, const dim3 &threads, sycl::queue q, const F& f) {
return launch(sycl::nd_range<3>{grid * threads, threads}, q, f);
}
template <class F>
sycl::event launch(const dim3 &grid, const dim3 &threads, const F& f) {
return launch(grid, threads, get_default_queue(), f);
}
}
Copy link

@mehdi-goli mehdi-goli Apr 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we put those as PR on sycl::compat repo?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FMarno
Copy link

FMarno commented Apr 16, 2025

We can add the dim3 overloads into the sycl-cuda-compat PR (#276) under a different namespace while we wait for upstream changes to make it to the release. Should be no harm in that.

B, dB, sB, tB,
C, dC, sC, tC,
alpha, beta);
auto event = syclcompat::launch(dimGrid, dimBlock, [=]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you only want to record this event if you are doing fine-grained profiling?
It is possible you could see a notable performance improvement by only recording events when required: see https://github.com/intel/llvm/pull/18021/files#r2048537558

Copy link

@JackAKirk JackAKirk Apr 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also if an analogue to cudaEventRecord in sycl would be useful to you then you could request creating an extension for e.g. oneapi_event_record that takes a sycl event.

@tdeng5 tdeng5 requested review from jiyang1011 and taozha2 and removed request for aacostadiaz August 12, 2025 05:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants