Skip to content

Conversation

@maliasadi
Copy link
Member

@maliasadi maliasadi commented Apr 17, 2025

For context, the default threading behavior of lightning.qubit assumes single-threaded execution at the gate level, while allowing multi-threaded execution over observables when using the adjoint differentiation pipeline. This design ensures optimal (SIMD) gate kernel performance but can limit performance for deep circuits on large CPUs with many cores and large caches.

The flag LQ_ENABLE_KERNEL_OMP enables OpenMP support across all kernel types (LM, AVX2, and AVX512) which allows for better performance tuning on HPC systems while preserving the default behavior for standard releases. This PR makes this default on LQ wheels and from-src installations.

[latest/latest]

[sc-101572]
[sc-104235]

@github-actions
Copy link
Contributor

Hello. You may have forgotten to update the changelog!
Please edit .github/CHANGELOG.md with:

  • A one-to-two sentence description of the change. You may include a small working example for new features.
  • A link back to this PR.
  • Your name (or GitHub username) in the contributors section.

@codecov
Copy link

codecov bot commented Apr 17, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 89.26%. Comparing base (e9353ef) to head (99a1036).

❗ There is a different number of reports uploaded between BASE (e9353ef) and HEAD (99a1036). Click for more details.

HEAD has 24 uploads less than BASE
Flag BASE (e9353ef) HEAD (99a1036)
unit_tests 31 7
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1133      +/-   ##
==========================================
- Coverage   95.84%   89.26%   -6.59%     
==========================================
  Files         243      187      -56     
  Lines       40695    28938   -11757     
==========================================
- Hits        39005    25831   -13174     
- Misses       1690     3107    +1417     
Flag Coverage Δ
unit_tests 89.26% <100.00%> (-6.59%) ⬇️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@maliasadi
Copy link
Member Author

/benchmark

1 similar comment
@LuisAlfredoNu
Copy link
Contributor

/benchmark

@maliasadi
Copy link
Member Author

/benchmark

@maliasadi
Copy link
Member Author

/benchmark

@ringo-but-quantum
Copy link
Collaborator

PennyLane Benchmarks

Benchmarks Report for lightning.qubit

Weighted Average Scores

Direct execution (without Catalyst)

  • Time: 1.53 🟢

  • Virtual Memory: 1.01

Aggregated Results

Time score

Workflow Python
QPE 1.57 🟢
QSVT 0.96 🔴
XAS 2.93 🟢
shor -
molecular_hamiltonian 1.01
sampling 1.10 🟢
stateprep 1.34 🟢
grover 2.68 🟢
QAOA_layers_scaling 1.33 🟢
QML -
QML_jaxjit 1.02
UCCSD 0.94 🔴
VQE 1.05 🟢

Memory score

Workflow Python virtual
QPE 1.00
QSVT 1.00
XAS 1.00
shor -
molecular_hamiltonian 1.00
sampling 1.00
stateprep 1.07 🟢
grover 1.00
QAOA_layers_scaling 1.00
QML -
QML_jaxjit 1.02
UCCSD 1.00
VQE 1.00

Complexity score

No complexity data available.

Detailed Per-Workflow Results

  • The assumed noise level for runtime improvements/regressions is 3.0%
  • ⚠️ marks workflows with runtime fluctuations greater than 5.0% (std/mean)

Direct execution (without Catalyst)

Workflow time [s] std/mean time score virt mem [MB] virt mem score
QPE[12-12] 0.971 0.4% 1.57 🟢 268.690 1.00
QPE[13-12] 1.978 0.1% 1.57 🟢 537.121 1.00
QSVT[10] 1.057 5.6% ⚠️ 0.96 🔴 2.282 1.00
XAS[2-1-9] 1.386 5.2% ⚠️ 3.04 🟢 15.057 1.00
XAS[2-2-9] 3.020 7.6% ⚠️ 2.82 🟢 15.048 1.00
molecular_hamiltonian[H2O-STO-3G] 3.109 1.0% 1.01 136.276 1.00
molecular_hamiltonian[NH3-STO-3G] 9.016 0.3% 1.01 334.763 1.00
sampling[24-2] 1.204 0.2% 1.10 🟢 812.612 1.00
sampling[25-2] 2.469 1.1% 1.10 🟢 1,623.308 1.00
stateprep[14-MottonenStatePreparation] 2.935 2.8% 0.82 🔴 26.530 1.08 🟢
stateprep[15-ArbitraryStatePreparation] 2.940 5.2% ⚠️ 1.86 🟢 18.416 1.06 🟢
grover[18] 0.610 3.9% 2.68 🟢 13.020 1.00
QAOA_layers_scaling[19-4] 0.866 0.6% 1.33 🟢 310.716 1.00
QML_jaxjit[IQPKernelClassifier-12-10] 4.761 18.2% ⚠️ 0.97 🔴 26.091 1.04 🟢
QML_jaxjit[IQPVariationalClassifier-20-10] 3.890 8.2% ⚠️ 1.07 🟢 34.584 1.00
UCCSD[H2O-STO-3G] 0.915 8.3% ⚠️ 0.81 🔴 136.468 1.00
UCCSD[NH3-STO-3G] 2.834 3.5% 1.07 🟢 334.777 1.00
VQE[H2O-STO-3G] 2.475 2.6% 1.05 🟢 136.148 1.00
VQE[NH3-STO-3G] 14.167 0.3% 1.05 🟢 334.743 1.00

System Information

Hardware and Software Specifications

System Current Reference
CPU Name Intel(R) Xeon(R) E-2388G CPU @ 3.20GHz Intel(R) Xeon(R) E-2388G CPU @ 3.20GHz
Architecture X86_64 X86_64
Cache L2 4096.0 KiB 4096.0 KiB
Cache L3 16384.0 KiB 16384.0 KiB
Cache L1 256.0 KiB I + 384.0 KiB D 256.0 KiB I + 384.0 KiB D
Max MHz 0.0 0.0
Min MHz 0.0 0.0
Nominal 3287.5 3287.5
Enabled 8 cores, 16 threads 8 cores, 16 threads
Memory 62.64 GB 62.64 GB
Storage 435.52 GB on / 435.52 GB on /
OS Linux Linux
OS Version Linux-5.15.0-142-generic-x86_64-with-glibc2.35 Linux-5.15.0-142-generic-x86_64-with-glibc2.35
Kernel 5.15.0-142-generic 5.15.0-142-generic
Hostname agassi agassi
Python Version 3.11.9 3.11.9
Python Compiler GCC 11.4.0 GCC 11.4.0
PennyLane Version 0.44.0.dev7 0.44.0.dev7
Lightning Version 0.44.0.dev3 0.44.0.dev3
Catalyst Version Unknown Unknown
NumPy Version 2.3.4 2.3.4
SciPy Version 1.16.2 1.16.2
Lightning Compiler Unknown Unknown

Report generated on Thu, 16 Oct 2025 at 01:32:38 (UTC)

@maliasadi maliasadi marked this pull request as ready for review October 16, 2025 13:24
Copy link
Contributor

@josephleekl josephleekl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems to work well! Some suggestions for things to update:

  • docs to mention this is on by default, and can be toggled on/off with OMP_NUM_THREADS, and might be a good idea to set to 1 if using e.g. concurrency
  • update changelog
  • in `.github/workflows/tests_lqcpu_python.yml we set it to on, with this change, we don't need to explicitly do that anymore?

@maliasadi
Copy link
Member Author

@josephleekl Now, it's ready to review ⚡

@LuisAlfredoNu
Copy link
Contributor

One question. Doing the OMP ON as default, will not interfere with the Parallel adjoint differentiation functionality? We will have the chance of running each gate with threads and each differentiation with more threads, producing a slowdown due to thread pool flood 🤔

Copy link
Contributor

@josephleekl josephleekl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @maliasadi , LGTM! 🚀

Copy link
Contributor

@LuisAlfredoNu LuisAlfredoNu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @maliasadi
🐊

@PennyLaneAI PennyLaneAI deleted a comment from ringo-but-quantum Oct 16, 2025
@maliasadi maliasadi added the do not merge Do not merge PR until this label is removed label Oct 16, 2025
Copy link
Member

@mlxd mlxd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just adding the red to prevent this from accidentally merging.

We need to do more intensive eval, including across the python package ecosystem and OSs. Will do a proper review next week

@maliasadi maliasadi changed the title Enable OpenMP pragmas for gate kernels Enable OpenMP pragmas for gate kernels on Linux Wheels Oct 24, 2025
@maliasadi maliasadi added the ci:build_wheels Activate wheel building. label Oct 24, 2025
Copy link
Member

@mlxd mlxd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one @maliasadi
Just a few queries/suggestions, but will be happy to approve pending your response.

@maliasadi maliasadi removed the do not merge Do not merge PR until this label is removed label Jan 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci:build_wheels Activate wheel building.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants