Skip to content

Conversation

@horizon-blue
Copy link
Contributor

@horizon-blue horizon-blue commented Nov 19, 2025

Summary of Changes

This PR introduces a new Array2D class for managing 2D views into contiguous arrays in host or device memory (powered by mdspan). Note that unlike the typical conainer types, Array2D does not own the underlying buffer: it simply takes in a pointer and construct a 2D view of it. As such, it does not deallocate the underlying buffer when the object itself is out of scope. This is an intentional design because we often need to pass the "views" of the buffer around within device code, and we often need to keep many copies of the same view around within device (in parallel), but we can only deallocate once on the host side.

Aside: while implementing this PR, I found CUDA std actually has an undocumented submdspan type (at cuda::std::submdspan). This could potentially be useful in creating multi-dimensional sub-views/non-contiguous views of the buffer as we start to implement tiling.

Test Plan

To run the unit tests included in the file:

pixi run ctest

@linear
Copy link

linear bot commented Nov 19, 2025

MET-24 Create universal container types

For containers like Array2D and FMBs, we'd like to have data structures that work either on CPU (host) or GPU (device).

One idea is to define generic containers to use Thrust's device_vector or host_vector under the hood. For the purpose of this project, the location of the memory is defined in compile time (i.e. we can specify the location via a template variable to the generic container type).

As part of the deliverable, we should also write unit tests to ensure that we can allocate & deallocate on both device & host.

@horizon-blue horizon-blue changed the title Generic Array2D on Device/Host Implement Array2D for non-owning 2D views of Host/Device memory Nov 20, 2025
.def_rw("x", &Vec3D::x)
.def_rw("y", &Vec3D::y)
.def_rw("z", &Vec3D::z)
.def("__add__", &operator+)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change is needed because the CUDA header files also defines operator+ in the global scope, so the compiler no longer has enough context to uniquely identify the right operator+ here.

@horizon-blue horizon-blue marked this pull request as ready for review November 20, 2025 05:28
@horizon-blue horizon-blue force-pushed the xiaoyan/met-24-create-universal-container-types branch from 48a0c8a to 5211bb8 Compare November 20, 2025 05:28
@horizon-blue horizon-blue force-pushed the xiaoyan/met-24-create-universal-container-types branch from 5211bb8 to bbe8659 Compare November 20, 2025 06:11
Copy link
Contributor

@mugamma mugamma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, although part of me thinks whether we should just use mdspan I didn't know it existed!

Comment on lines +28 to +33
__host__ __device__ constexpr T& operator()(uint32_t row, uint32_t col) {
return data_view_(row, col);
}
__host__ __device__ constexpr T operator()(uint32_t row, uint32_t col) const {
return data_view_(row, col);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason for using operator() instead of the more natural operator[]?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! Sadly, prior to C++23, we aren't allowed to define operator[] with more than one arguments. In cuda::std::mdspan, they also made similar design choice, where they defined operator() in earlier C++ standards, and switched to operator[] in C++23 and above


auto data = TypeParam(rows * cols);
// create 2D view into the underlying data on host or device
auto array2d = Array2D(thrust::raw_pointer_cast(data.data()), rows, cols);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of thrust::raw_pointer_cast here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed because thrust::host/device_vector's data() method returns a custom pointer type, so this casting is needed to convert them to raw pointer that mdspan can understand.

Alternatively, if we're expecting to use thrust a lot, I'm just thinking that we can also define overloaded Array2D constructors that are specialized to thrust pointer types so we don't have to explicitly invoke this conversion everywhere

@mugamma mugamma merged commit 1424f94 into master Nov 20, 2025
1 check passed
@horizon-blue horizon-blue deleted the xiaoyan/met-24-create-universal-container-types branch November 20, 2025 20:09
@horizon-blue
Copy link
Contributor Author

part of me thinks whether we should just use mdspan I didn't know it existed!

I was also thinking about that while working on this PR haha (in my earlier commits I was defining Array2D as an alias of mdspan). Though one nice thing about having our custom wrapper is that we can define more convenient helper methods so our codes don't have to be super verbose.

horizon-blue added a commit that referenced this pull request Nov 23, 2025
(Closes MET-47)

## Summary of Changes

This PR addresses some of the suggestions that @mugamma brought up in
#14. In particular, it
defines the `operator[]` on `Array2D` to return a 1D view of a row, so
we can use patterns like `array2d[i][j]` instead of `array2d(i, j)` to
access the element.

Another nice thing about returning the 1D span is that we can use
range-based for loop to go over the elements in a row as well, e.g.

```cpp
for (auto& val : array2d[rows]) { /* do something with val*/ }
```

You can find some example usages in the [included test
file](https://github.com/probcomp/GenMetaBalls/pull/17/files#diff-92c53773082b537451d1c0e757c8ac6b5c6f85fd4bcd6e01bf82cade202575fb).

Another minor change in this PR is the refactoring of`Array2D` methods
to use the new `CUDA_CALLABLE` macro that Arijit introduced recently.

## Test Plan

To run the included unit tests:

```bash
pixi run test
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants