Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #1308 +/- ##
==========================================
+ Coverage 95.69% 95.86% +0.17%
==========================================
Files 278 329 +51
Lines 40965 46716 +5751
==========================================
+ Hits 39200 44783 +5583
- Misses 1765 1933 +168
☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| #include "cuda_helpers.hpp" | ||
|
|
||
| #include <nanobind/nanobind.h> | ||
| #include <nanobind/stl/pair.h> |
There was a problem hiding this comment.
this will help with an issue seen sometimes for ARM CUDA wheels CIs:
e.g.
https://github.com/PennyLaneAI/pennylane-lightning/actions/runs/23185381696/job/67367515707
https://github.com/PennyLaneAI/pennylane-lightning/actions/runs/23167693870/job/67311848768
This is indeterministic with some CI runs, possibly due to GPU runner environment issues. The error of the missing pair include masks the real error, so this will help with that.
maliasadi
left a comment
There was a problem hiding this comment.
Thanks @josephleekl!
A few quick notes for the PR description:
- Kokkos 5.0 works with CUDA 2.8+
- The AMD dev cloud instructions is just for MI300 series (GFX942 arch)
Happy to approve after resolving the comments...
There was a problem hiding this comment.
Kokkos 5.0 mandates C++20. This raises the minimum compiler requirements for anyone building from source:
• GCC ≥ 10.4
• Clang ≥ 14.0 (CPU) / 15.0 (CUDA)
• NVCC ≥ 12.2
• MSVC ≥ 19.30
Could you please check and update CMakes and docs accordingly?
| KOKKOS_LAMBDA(std::size_t k) { | ||
| arr_(k) *= static_cast<PrecisionT>( | ||
| 1 - 2 * int(Kokkos::Impl::bit_count(k & wires_parity) % 2)); | ||
| 1 - 2 * int(Kokkos::Experimental::popcount_builtin( |
There was a problem hiding this comment.
We should be safe to use std::popcount in C++20 I suppose 🤔
| 1 - 2 * int(Kokkos::Experimental::popcount_builtin( | |
| 1 - 2 * int(std::popcount( |
There was a problem hiding this comment.
@maliasadi but this is to execute on the device, not on the host, and std::popcount is just on the host. Kokkos popcount is more appropriate in this case
| if (ctrls_mask == (ctrls_parity & k)) { | ||
| arr_(k) *= static_cast<PrecisionT>( | ||
| 1 - 2 * int(Kokkos::Impl::bit_count(k & wires_parity) % 2)); | ||
| 1 - 2 * int(Kokkos::Experimental::popcount_builtin( |
There was a problem hiding this comment.
| 1 - 2 * int(Kokkos::Experimental::popcount_builtin( | |
| 1 - 2 * int(std::popcount( |
There was a problem hiding this comment.
but actually maybe i should use Kokkos::popcountinstead. let me try it
| ? shift_0 | ||
| : shift_1; | ||
| arr(k) *= | ||
| (Kokkos::Experimental::popcount_builtin(k & wires_parity) % 2 == 0) |
There was a problem hiding this comment.
could you replace all these Kokkos::Experimental::popcount_builtin with std::popcount?
| with: | ||
| os: ubuntu-24.04 | ||
| kokkos_version: "4.5.00" | ||
| kokkos_version: ${{ (inputs.lightning-version == 'stable' && '4.5.00') || '5.0.0' }} # FIXME: remove after v0.45 release |
There was a problem hiding this comment.
Don't forget to update the changelog :)
pennylane_lightning/core/simulators/lightning_kokkos/gates/BasicGeneratorFunctors.hpp
Outdated
Show resolved
Hide resolved
pennylane_lightning/core/simulators/lightning_kokkos/gates/BasicGateFunctors.hpp
Outdated
Show resolved
Hide resolved
pennylane_lightning/core/simulators/lightning_kokkos/gates/BasicGateFunctors.hpp
Outdated
Show resolved
Hide resolved
pennylane_lightning/core/simulators/lightning_kokkos/gates/BasicGateFunctors.hpp
Outdated
Show resolved
Hide resolved
pennylane_lightning/core/simulators/lightning_kokkos/gates/BasicGateFunctors.hpp
Outdated
Show resolved
Hide resolved
pennylane_lightning/core/simulators/lightning_kokkos/gates/BasicGeneratorFunctors.hpp
Outdated
Show resolved
Hide resolved
|
Found 10 test failures on Blacksmith runners: Failures
|
NOTE: For this to work with CUDA compiler, we need to bump CUDA to 12.8, otherwise the compilation will hang (replicated with CUDA 12.4, 12.5, 12.6) . This is likely due to updates to MDSpan functionality Kokkos, and is suggested by Kokkos.
Update: CIs images have now been updated to CUDA 12.9, so this can proceed.
To compile on AMDGPU Dev Cloud:
The cpp tests should pass.
Docker build
stable/stable
latest/latest
Context:
Upgrade Kokkos to version 5.0, which unlocks the ability to compile for more architecture (e.g. Zen4/Zen5 CPU, blackwell GPU) , along with performance improvements.
This will also be a step towards future upgrade to Kokkos 5.1, which will bring support for MI350X/MI355X.
Description of the Change:
Benefits:
Possible Drawbacks:
Related GitHub Issues:
[sc-89475]