-
Notifications
You must be signed in to change notification settings - Fork 1
[cublas] introduce onemkl_cublas_host_task #7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from 1 commit
12edf97
761b5e6
73a7f14
34b7dae
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -68,12 +68,13 @@ class CublasScopedContextHandler { | |
CUcontext original_; | ||
cl::sycl::context placedContext_; | ||
bool needToRecover_; | ||
cl::sycl::interop_handler& ih; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. how will this work, if there's no There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The idea was to ifdef the headers in cublas_task.hpp, such that this header is not even included in case we are compiling with hipSYCL. see: sbalint98#2 |
||
static thread_local cublas_handle handle_helper; | ||
CUstream get_stream(const cl::sycl::queue &queue); | ||
cl::sycl::context get_context(const cl::sycl::queue &queue); | ||
|
||
public: | ||
CublasScopedContextHandler(cl::sycl::queue queue); | ||
CublasScopedContextHandler(cl::sycl::queue queue, cl::sycl::interop_handler& ih); | ||
|
||
~CublasScopedContextHandler() noexcept(false); | ||
/** | ||
|
@@ -87,7 +88,7 @@ class CublasScopedContextHandler { | |
// This is a work-around function for reinterpret_casting the memory. This | ||
// will be fixed when SYCL-2020 has been implemented for Pi backend. | ||
template <typename T, typename U> | ||
inline T get_mem(cl::sycl::interop_handler ih, U acc) { | ||
inline T get_mem(U acc) { | ||
CUdeviceptr cudaPtr = ih.get_mem<cl::sycl::backend::cuda>(acc); | ||
return reinterpret_cast<T>(cudaPtr); | ||
} | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
#ifndef _MKL_BLAS_CUBLAS_TASK_HPP_ | ||
#define _MKL_BLAS_CUBLAS_TASK_HPP_ | ||
#include <cublas_v2.h> | ||
#include <cuda.h> | ||
#include <complex> | ||
#include <CL/sycl.hpp> | ||
#include "oneapi/mkl/types.hpp" | ||
#include "cublas_scope_handle.hpp" | ||
#include <CL/sycl/detail/pi.hpp> | ||
|
||
namespace oneapi { | ||
namespace mkl { | ||
namespace blas { | ||
namespace cublas { | ||
|
||
template <typename H, typename F> | ||
static inline auto host_task_internal(H &cgh, cl::sycl::queue queue, F f) -> decltype(cgh.interop_task(f)) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the decltype here looks fishy.. first of all, you don't return anything, did you mean to write so either replace the auto and decltype.. with void (as you are discarding the result in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Your totally right, this part of the code is quite confusing. I followed the pattern that has been used for the CPU functions see: https://github.com/oneapi-src/oneMKL/blob/e8e3dabf9fbda0556b8075c76b657336f88440f0/src/blas/backends/mklcpu/mklcpu_common.hpp#L42-L56 Probably would be nicer to do something like:
I intend to add the functionality for hipSYCL like this: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What does it even return? I think none of the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's basically sfinae on whether the function exists or not... |
||
cgh.interop_task([f, queue](cl::sycl::interop_handler ih){ | ||
auto sc = CublasScopedContextHandler(queue, ih); | ||
f(sc); | ||
}); | ||
} | ||
|
||
template <typename H, typename F> | ||
static inline void onemkl_cublas_host_task(H &cgh, cl::sycl::queue queue, F f) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. so.. you will be There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This what I had in mind, yes. |
||
(void)host_task_internal(cgh, queue, f); | ||
} | ||
|
||
} // namespace cublas | ||
} // namespace blas | ||
} // namespace mkl | ||
} // namespace oneapi | ||
#endif // _MKL_BLAS_CUBLAS_TASK_HPP_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does
sc
have to be passed in by copy or would non-const reference make more sense? In general I imagine that this object would contain state, so passing in by reference might be more convenient and/or more performant.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, indeed
sc
can be passed as a reference.