Skip to content

Update kokkos 5.0#1308

Open
josephleekl wants to merge 30 commits intomasterfrom
kokkos-5.0
Open

Update kokkos 5.0#1308
josephleekl wants to merge 30 commits intomasterfrom
kokkos-5.0

Conversation

@josephleekl
Copy link
Contributor

@josephleekl josephleekl commented Dec 16, 2025

NOTE: For this to work with CUDA compiler, we need to bump CUDA to 12.8, otherwise the compilation will hang (replicated with CUDA 12.4, 12.5, 12.6) . This is likely due to updates to MDSpan functionality Kokkos, and is suggested by Kokkos.
Update: CIs images have now been updated to CUDA 12.9, so this can proceed.

To compile on AMDGPU Dev Cloud:

apt install cmake ninja-build

cd pennylane-lightning
git checkout kokkos-5.0
# turn off `ENABLE_WARNINGS` in Makefile
build_options="-DCMAKE_CXX_COMPILER=hipcc -DKokkos_ENABLE_HIP=ON -DKokkos_ARCH_AMD_GFX942=ON -DCMAKE_PREFIX_PATH=/opt/rocm" PL_BACKEND="lightning_kokkos" make test-cpp

The cpp tests should pass.

Docker build
stable/stable
latest/latest
Context:
Upgrade Kokkos to version 5.0, which unlocks the ability to compile for more architecture (e.g. Zen4/Zen5 CPU, blackwell GPU) , along with performance improvements.

This will also be a step towards future upgrade to Kokkos 5.1, which will bring support for MI350X/MI355X.

Description of the Change:

Benefits:

Possible Drawbacks:

Related GitHub Issues:

[sc-89475]

@codecov
Copy link

codecov bot commented Dec 16, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 95.86%. Comparing base (274fd42) to head (2d54a43).

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1308      +/-   ##
==========================================
+ Coverage   95.69%   95.86%   +0.17%     
==========================================
  Files         278      329      +51     
  Lines       40965    46716    +5751     
==========================================
+ Hits        39200    44783    +5583     
- Misses       1765     1933     +168     
Flag Coverage Δ
unit_tests 95.86% <100.00%> (+0.17%) ⬆️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@maliasadi maliasadi added ci:build_wheels Activate wheel building. ci:use-gpu-runner Enable usage of GPU runner for this Pull Request labels Dec 16, 2025
@josephleekl josephleekl marked this pull request as ready for review February 5, 2026 20:07
@josephleekl josephleekl added the do not merge Do not merge PR until this label is removed label Feb 10, 2026
@josephleekl josephleekl marked this pull request as draft February 23, 2026 20:47
@josephleekl josephleekl changed the title Update kokkos 5.0 Update kokkos 5.1 Mar 17, 2026
@josephleekl josephleekl changed the title Update kokkos 5.1 Update kokkos 5.0 Mar 17, 2026
@josephleekl josephleekl removed the do not merge Do not merge PR until this label is removed label Mar 17, 2026
@josephleekl josephleekl marked this pull request as ready for review March 17, 2026 07:57
@josephleekl josephleekl added the ci:use-multi-gpu-runner Enable usage of Multi-GPU runner for this Pull Request label Mar 17, 2026
#include "cuda_helpers.hpp"

#include <nanobind/nanobind.h>
#include <nanobind/stl/pair.h>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will help with an issue seen sometimes for ARM CUDA wheels CIs:
e.g.
https://github.com/PennyLaneAI/pennylane-lightning/actions/runs/23185381696/job/67367515707
https://github.com/PennyLaneAI/pennylane-lightning/actions/runs/23167693870/job/67311848768

This is indeterministic with some CI runs, possibly due to GPU runner environment issues. The error of the missing pair include masks the real error, so this will help with that.

Copy link
Member

@maliasadi maliasadi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @josephleekl!

A few quick notes for the PR description:

  • Kokkos 5.0 works with CUDA 2.8+
  • The AMD dev cloud instructions is just for MI300 series (GFX942 arch)

Happy to approve after resolving the comments...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kokkos 5.0 mandates C++20. This raises the minimum compiler requirements for anyone building from source:
• GCC ≥ 10.4
• Clang ≥ 14.0 (CPU) / 15.0 (CUDA)
• NVCC ≥ 12.2
• MSVC ≥ 19.30

Could you please check and update CMakes and docs accordingly?

KOKKOS_LAMBDA(std::size_t k) {
arr_(k) *= static_cast<PrecisionT>(
1 - 2 * int(Kokkos::Impl::bit_count(k & wires_parity) % 2));
1 - 2 * int(Kokkos::Experimental::popcount_builtin(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be safe to use std::popcount in C++20 I suppose 🤔

Suggested change
1 - 2 * int(Kokkos::Experimental::popcount_builtin(
1 - 2 * int(std::popcount(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maliasadi but this is to execute on the device, not on the host, and std::popcount is just on the host. Kokkos popcount is more appropriate in this case

if (ctrls_mask == (ctrls_parity & k)) {
arr_(k) *= static_cast<PrecisionT>(
1 - 2 * int(Kokkos::Impl::bit_count(k & wires_parity) % 2));
1 - 2 * int(Kokkos::Experimental::popcount_builtin(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1 - 2 * int(Kokkos::Experimental::popcount_builtin(
1 - 2 * int(std::popcount(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but actually maybe i should use Kokkos::popcountinstead. let me try it

? shift_0
: shift_1;
arr(k) *=
(Kokkos::Experimental::popcount_builtin(k & wires_parity) % 2 == 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you replace all these Kokkos::Experimental::popcount_builtin with std::popcount?

with:
os: ubuntu-24.04
kokkos_version: "4.5.00"
kokkos_version: ${{ (inputs.lightning-version == 'stable' && '4.5.00') || '5.0.0' }} # FIXME: remove after v0.45 release
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget to update the changelog :)

@blacksmith-sh
Copy link

blacksmith-sh bot commented Mar 23, 2026

Found 10 test failures on Blacksmith runners:

Failures

Test View Logs
coverage: platform linux, python 3.12/12-final-0 View Logs
TestTemplates/test_QuantumMonteCarlo[graph_enabled-device_kwargs0] View Logs
TestTemplates/test_QuantumMonteCarlo[graph_enabled-device_kwargs0] View Logs
TestTemplates/test_QuantumMonteCarlo[graph_enabled-device_kwargs0] View Logs
TestTemplates/test_QuantumMonteCarlo[graph_enabled-device_kwargs0] View Logs
TestTemplates/test_QuantumMonteCarlo[graph_enabled-device_kwargs0] View Logs
TestTemplates/test_QuantumMonteCarlo[graph_enabled-device_kwargs0] View Logs
TestTemplates/test_QuantumMonteCarlo[graph_enabled-device_kwargs0] View Logs
TestTemplates/test_QuantumMonteCarlo[graph_enabled-device_kwargs0] View Logs
TestTemplates/test_QuantumMonteCarlo[graph_enabled-device_kwargs0] View Logs

Fix in Cursor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci:build_wheels Activate wheel building. ci:use-gpu-runner Enable usage of GPU runner for this Pull Request ci:use-multi-gpu-runner Enable usage of Multi-GPU runner for this Pull Request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants