-
Notifications
You must be signed in to change notification settings - Fork 0
Implement Array2D for non-owning 2D views of Host/Device memory
#14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Array2D for non-owning 2D views of Host/Device memory
#14
Conversation
MET-24 Create universal container types
For containers like Array2D and FMBs, we'd like to have data structures that work either on CPU (host) or GPU (device). One idea is to define generic containers to use Thrust's device_vector or host_vector under the hood. For the purpose of this project, the location of the memory is defined in compile time (i.e. we can specify the location via a template variable to the generic container type). As part of the deliverable, we should also write unit tests to ensure that we can allocate & deallocate on both device & host. |
Array2D for non-owning 2D views of Host/Device memory
| .def_rw("x", &Vec3D::x) | ||
| .def_rw("y", &Vec3D::y) | ||
| .def_rw("z", &Vec3D::z) | ||
| .def("__add__", &operator+) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change is needed because the CUDA header files also defines operator+ in the global scope, so the compiler no longer has enough context to uniquely identify the right operator+ here.
48a0c8a to
5211bb8
Compare
5211bb8 to
bbe8659
Compare
mugamma
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, although part of me thinks whether we should just use mdspan I didn't know it existed!
| __host__ __device__ constexpr T& operator()(uint32_t row, uint32_t col) { | ||
| return data_view_(row, col); | ||
| } | ||
| __host__ __device__ constexpr T operator()(uint32_t row, uint32_t col) const { | ||
| return data_view_(row, col); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason for using operator() instead of the more natural operator[]?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! Sadly, prior to C++23, we aren't allowed to define operator[] with more than one arguments. In cuda::std::mdspan, they also made similar design choice, where they defined operator() in earlier C++ standards, and switched to operator[] in C++23 and above
|
|
||
| auto data = TypeParam(rows * cols); | ||
| // create 2D view into the underlying data on host or device | ||
| auto array2d = Array2D(thrust::raw_pointer_cast(data.data()), rows, cols); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the purpose of thrust::raw_pointer_cast here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is needed because thrust::host/device_vector's data() method returns a custom pointer type, so this casting is needed to convert them to raw pointer that mdspan can understand.
Alternatively, if we're expecting to use thrust a lot, I'm just thinking that we can also define overloaded Array2D constructors that are specialized to thrust pointer types so we don't have to explicitly invoke this conversion everywhere
I was also thinking about that while working on this PR haha (in my earlier commits I was defining |
(Closes MET-47) ## Summary of Changes This PR addresses some of the suggestions that @mugamma brought up in #14. In particular, it defines the `operator[]` on `Array2D` to return a 1D view of a row, so we can use patterns like `array2d[i][j]` instead of `array2d(i, j)` to access the element. Another nice thing about returning the 1D span is that we can use range-based for loop to go over the elements in a row as well, e.g. ```cpp for (auto& val : array2d[rows]) { /* do something with val*/ } ``` You can find some example usages in the [included test file](https://github.com/probcomp/GenMetaBalls/pull/17/files#diff-92c53773082b537451d1c0e757c8ac6b5c6f85fd4bcd6e01bf82cade202575fb). Another minor change in this PR is the refactoring of`Array2D` methods to use the new `CUDA_CALLABLE` macro that Arijit introduced recently. ## Test Plan To run the included unit tests: ```bash pixi run test ```
Summary of Changes
This PR introduces a new
Array2Dclass for managing 2D views into contiguous arrays in host or device memory (powered by mdspan). Note that unlike the typical conainer types,Array2Ddoes not own the underlying buffer: it simply takes in a pointer and construct a 2D view of it. As such, it does not deallocate the underlying buffer when the object itself is out of scope. This is an intentional design because we often need to pass the "views" of the buffer around within device code, and we often need to keep many copies of the same view around within device (in parallel), but we can only deallocate once on the host side.Aside: while implementing this PR, I found CUDA std actually has an undocumented submdspan type (at
cuda::std::submdspan). This could potentially be useful in creating multi-dimensional sub-views/non-contiguous views of the buffer as we start to implement tiling.Test Plan
To run the unit tests included in the file: