-
Notifications
You must be signed in to change notification settings - Fork 16
Debug simplified field compilation and runtime issues #227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR addresses compilation and runtime issues for simplified field operations to enable GPU execution, while maintaining host-space APIs. The implementation converts plain C++ lambdas to Kokkos-compatible lambdas (KOKKOS_LAMBDA and KOKKOS_CLASS_LAMBDA), refactors inline kernels into explicit functors for better GPU compatibility, and adds necessary memory space conversions between device and host. However, several memory space handling issues remain, particularly in the point_cloud adapter, and there are performance inefficiencies in the copy patterns used for host-device data transfers.
Key changes:
- Converted plain C++ lambdas to KOKKOS_LAMBDA for device compatibility in test files and field evaluation code
- Refactored inline Kokkos::parallel_for lambdas into named functors in omega_h and point_cloud adapters
- Added host-device memory copies and explicit memory space handling throughout field operations
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| test/test_spline_interpolator.cpp | Added fence for synchronization; updated Rank4View construction to use dynamic extents |
| test/test_field_interpolation.cpp | Converted plain lambdas to KOKKOS_LAMBDA; added host memory conversions for field data |
| test/test_field_evaluation.cpp | Converted plain lambdas to KOKKOS_LAMBDA; added host memory conversions for field data |
| test/test_field_copy.cpp | Updated to use HostMemorySpace execution space for parallel operations |
| test/test_field_communication.cpp | Added host copies of device arrays before host access; updated to use proper memory spaces |
| src/pcms/transfer_field2.h | Updated make_const_array_view template parameters (inconsistently) |
| src/pcms/interpolator/spline_interpolator.hpp | Converted lambdas to KOKKOS_CLASS_LAMBDA in functors; added debug printf for errors |
| src/pcms/adapter/point_cloud/point_cloud_layout.cpp | Refactored initialization into functor; added make_const_array_view calls with memory space (but with incorrect memory space casting) |
| src/pcms/adapter/point_cloud/point_cloud.cpp | Refactored coordinate copy into functor; updated memory space handling (with incorrect casting) |
| src/pcms/adapter/omega_h/omega_h_field_layout.h | Added host-side copies of gids, owned, and dof_holder_coords |
| src/pcms/adapter/omega_h/omega_h_field_layout.cpp | Refactored inline kernels into functors; added host memory synchronization with redundant copies |
| src/pcms/adapter/omega_h/omega_h_field2.h | Changed dof_holder_data_ to HostMemorySpace |
| src/pcms/adapter/omega_h/omega_h_field2.cpp | Added extensive host-device memory copies; refactored inline kernels into functors (with performance issues and memory access bugs) |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| auto coordinates_host_tmp = Kokkos::create_mirror_view(coordinates_d); | ||
| Kokkos::deep_copy(coordinates_host_tmp, hint.coordinates_); | ||
| Kokkos::deep_copy(coordinates_d, coordinates_host_tmp); |
Copilot
AI
Nov 25, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redundant copy operations. Lines 340-342 create a mirror view, copy from hint.coordinates_ (HostMemorySpace) to the mirror, then copy to device. This is unnecessary - you can directly deep_copy from host to device. Replace lines 338-342 with: Kokkos::View<Real**> coordinates_d("coordinates_d", hint.coordinates_.extent(0), hint.coordinates_.extent(1)); Kokkos::deep_copy(coordinates_d, hint.coordinates_);
| auto coordinates_host_tmp = Kokkos::create_mirror_view(coordinates_d); | |
| Kokkos::deep_copy(coordinates_host_tmp, hint.coordinates_); | |
| Kokkos::deep_copy(coordinates_d, coordinates_host_tmp); | |
| Kokkos::deep_copy(coordinates_d, hint.coordinates_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Direct 2D deep_copy fails due to layout differences between views. The extra copy here follows the workaround from the Kokkos documentation for handling layout mismatches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs a comment in place to explain what you are doing and link to the kokkos documentation.
| auto eval_results_tmp = | ||
| Kokkos::create_mirror_view(Kokkos::HostSpace(), eval_results); | ||
| Kokkos::deep_copy(eval_results_tmp, eval_results); | ||
| Kokkos::View<Real**, HostMemorySpace> eval_results_h( | ||
| "eval_results_h", eval_results_tmp.extent(0), eval_results_tmp.extent(1)); | ||
| Kokkos::deep_copy(eval_results_h, eval_results_tmp); |
Copilot
AI
Nov 25, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redundant copy operations. Lines 347-352 create a mirror, copy from device to mirror, create another host view, then copy mirror to host view. This is unnecessary - you can directly use the mirror view. Replace lines 347-352 with: auto eval_results_h = Kokkos::create_mirror_view_and_copy(Kokkos::HostSpace(), eval_results);
| auto eval_results_tmp = | |
| Kokkos::create_mirror_view(Kokkos::HostSpace(), eval_results); | |
| Kokkos::deep_copy(eval_results_tmp, eval_results); | |
| Kokkos::View<Real**, HostMemorySpace> eval_results_h( | |
| "eval_results_h", eval_results_tmp.extent(0), eval_results_tmp.extent(1)); | |
| Kokkos::deep_copy(eval_results_h, eval_results_tmp); | |
| auto eval_results_h = Kokkos::create_mirror_view_and_copy(Kokkos::HostSpace(), eval_results); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same issue regarding layout incompatible views.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is a repeating pattern, let's consider making a function called deep_copy_mismatch_layouts (or something along these lines). We can then put the documentation with the function. I think that will make this code much more obvious.
| auto dof_holder_coords_device_tmp = | ||
| Kokkos::create_mirror_view(Kokkos::HostSpace(), dof_holder_coords_); | ||
| Kokkos::deep_copy(dof_holder_coords_device_tmp, dof_holder_coords_); | ||
| Kokkos::deep_copy(dof_holder_coords_host_, dof_holder_coords_device_tmp); |
Copilot
AI
Nov 25, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redundant copy operations. Lines 249-252 create a mirror view, copy from device to mirror, then copy mirror to dof_holder_coords_host_. This is unnecessary - you can directly deep_copy from device to host. Replace lines 249-252 with: Kokkos::deep_copy(dof_holder_coords_host_, dof_holder_coords_);
| auto dof_holder_coords_device_tmp = | |
| Kokkos::create_mirror_view(Kokkos::HostSpace(), dof_holder_coords_); | |
| Kokkos::deep_copy(dof_holder_coords_device_tmp, dof_holder_coords_); | |
| Kokkos::deep_copy(dof_holder_coords_host_, dof_holder_coords_device_tmp); | |
| Kokkos::deep_copy(dof_holder_coords_host_, dof_holder_coords_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same issue regarding layout incompatible views.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
specify function as described above.
| Rank1View<const bool, HostMemorySpace> PointCloudLayout::GetOwned() const | ||
| { | ||
| return make_const_array_view(owned_); | ||
| return make_const_array_view<const Kokkos::View<bool*>, HostMemorySpace>( | ||
| owned_); |
Copilot
AI
Nov 25, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
owned_ is a device memory view (Kokkos::View<bool*>) but GetOwned() returns it as a HostMemorySpace view without a deep_copy. This creates an invalid memory space cast. Either owned_ should be declared as Kokkos::View<bool*, HostMemorySpace>, or a deep_copy to host memory should be performed before returning.
|
@Sichao25 Take the copilot review with the appropriate grain of salt. I will go through this in detail over the next few days. I'm a bit crunched for time with proposal writing. |
| coords(i, 0) = coordinates(i, 0); | ||
| coords(i, 1) = coordinates(i, 1); | ||
| }); | ||
| auto coordinates_host_tmp = Kokkos::create_mirror_view(coords); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these arrays should probably be stored at the class level so we aren't creating them lots of times, but we can leave that to a future optimization pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created a issue including this optimization #233
| auto coordinates_host_tmp = Kokkos::create_mirror_view(coordinates_d); | ||
| Kokkos::deep_copy(coordinates_host_tmp, hint.coordinates_); | ||
| Kokkos::deep_copy(coordinates_d, coordinates_host_tmp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs a comment in place to explain what you are doing and link to the kokkos documentation.
| auto eval_results = mesh_field_->evaluate(coordinates_d, offsets_d); | ||
|
|
||
| auto eval_results_tmp = | ||
| Kokkos::create_mirror_view(Kokkos::HostSpace(), eval_results); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally we can move the construction of these views to be a one-time operation based on the field layout which will tell us how big the arrays will need to be.
| auto eval_results_tmp = | ||
| Kokkos::create_mirror_view(Kokkos::HostSpace(), eval_results); | ||
| Kokkos::deep_copy(eval_results_tmp, eval_results); | ||
| Kokkos::View<Real**, HostMemorySpace> eval_results_h( | ||
| "eval_results_h", eval_results_tmp.extent(0), eval_results_tmp.extent(1)); | ||
| Kokkos::deep_copy(eval_results_h, eval_results_tmp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is a repeating pattern, let's consider making a function called deep_copy_mismatch_layouts (or something along these lines). We can then put the documentation with the function. I think that will make this code much more obvious.
| return owned_gids; | ||
| } | ||
|
|
||
| struct ComputeVertexCoordsFunctor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add comment saying that this is a workaround to specify the parametric coordinates for MeshFields to be replaced when SCOREC/meshFields#70 is resolved.
| } | ||
| }; | ||
|
|
||
| struct ComputeEdgeCoordsFunctor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add comment saying that this is a workaround to specify the parametric coordinates for MeshFields to be replaced when SCOREC/meshFields#70 is resolved.
| auto dof_holder_coords_device_tmp = | ||
| Kokkos::create_mirror_view(Kokkos::HostSpace(), dof_holder_coords_); | ||
| Kokkos::deep_copy(dof_holder_coords_device_tmp, dof_holder_coords_); | ||
| Kokkos::deep_copy(dof_holder_coords_host_, dof_holder_coords_device_tmp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
specify function as described above.
| gids_[i] = 0; | ||
| }); | ||
| Kokkos::parallel_for(coords_.extent(0), | ||
| InitializeOwnedGidsFunctor(owned_, gids_)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it will be more clear if we use fill and iota.
The fill algorithm is defined here: https://kokkos.org/kokkos-core-wiki/API/algorithms/std-algorithms/all/StdFill.html
It looks like there is no function called iota defined in kokko so we will need to define our own. Don't worry about being super general, just take the 1D array you have here and perform filling as you already do.
|
@Sichao25 I think if you add this redev version you will get past the perfstubs issue: https://github.com/SCOREC/pcms/pull/178/files#diff-dd58fe9ad10c75a10b51fd76d06aa41cb591ca2181284244e270c65765464c4fR101 I think what's going on is that previously redev was not exporting perfstubs in its cmake config file. |
|
Thanks. I will add it and test. |
|
Just merged. Thanks! |
This PR debugs the issue where the simplified field could only compile and work in host space. The implementation maintains most APIs returning host space arrays even when executing in GPU. In the future, supporting GPU simplified fields APIs would be preferable.
A few additional issues were not addressed in this PR:
These issues will be addressed in future PRs.