Enable OpenMP pragmas for gate kernels on Linux Wheels #1133

maliasadi · 2025-04-17T04:52:13Z

For context, the default threading behavior of lightning.qubit assumes single-threaded execution at the gate level, while allowing multi-threaded execution over observables when using the adjoint differentiation pipeline. This design ensures optimal (SIMD) gate kernel performance but can limit performance for deep circuits on large CPUs with many cores and large caches.

The flag LQ_ENABLE_KERNEL_OMP enables OpenMP support across all kernel types (LM, AVX2, and AVX512) which allows for better performance tuning on HPC systems while preserving the default behavior for standard releases. This PR makes this default on LQ wheels and from-src installations.

[latest/latest]

[sc-101572]
[sc-104235]

github-actions · 2025-04-17T04:52:27Z

Hello. You may have forgotten to update the changelog!
Please edit .github/CHANGELOG.md with:

A one-to-two sentence description of the change. You may include a small working example for new features.
A link back to this PR.
Your name (or GitHub username) in the contributors section.

codecov · 2025-04-17T04:55:30Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 89.26%. Comparing base (e9353ef) to head (99a1036).

❗ There is a different number of reports uploaded between BASE (e9353ef) and HEAD (99a1036). Click for more details.

HEAD has 24 uploads less than BASE

Flag BASE (e9353ef) HEAD (99a1036)

unit_tests 31 7

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1133      +/-   ##
==========================================
- Coverage   95.84%   89.26%   -6.59%     
==========================================
  Files         243      187      -56     
  Lines       40695    28938   -11757     
==========================================
- Hits        39005    25831   -13174     
- Misses       1690     3107    +1417

Flag	Coverage Δ
unit_tests	`89.26% <100.00%> (-6.59%)`	⬇️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

maliasadi · 2025-04-17T05:03:08Z

/benchmark

LuisAlfredoNu · 2025-04-21T15:16:19Z

/benchmark

maliasadi · 2025-05-28T00:13:59Z

/benchmark

maliasadi · 2025-10-15T21:01:57Z

/benchmark

ringo-but-quantum · 2025-10-16T01:32:59Z

PennyLane Benchmarks

Benchmarks Report for lightning.qubit

Weighted Average Scores

Direct execution (without Catalyst)

Time: 1.53 🟢
Virtual Memory: 1.01

Aggregated Results

Time score

Workflow	Python
QPE	1.57	🟢
QSVT	0.96	🔴
XAS	2.93	🟢
shor	-
molecular_hamiltonian	1.01
sampling	1.10	🟢
stateprep	1.34	🟢
grover	2.68	🟢
QAOA_layers_scaling	1.33	🟢
QML	-
QML_jaxjit	1.02
UCCSD	0.94	🔴
VQE	1.05	🟢

Memory score

Workflow	Python virtual
QPE	1.00
QSVT	1.00
XAS	1.00
shor	-
molecular_hamiltonian	1.00
sampling	1.00
stateprep	1.07	🟢
grover	1.00
QAOA_layers_scaling	1.00
QML	-
QML_jaxjit	1.02
UCCSD	1.00
VQE	1.00

Complexity score

No complexity data available.

Detailed Per-Workflow Results

The assumed noise level for runtime improvements/regressions is 3.0%
⚠️ marks workflows with runtime fluctuations greater than 5.0% (std/mean)

Direct execution (without Catalyst)

Workflow	time [s]	std/mean		time score		virt mem [MB]	virt mem score
QPE[12-12]	0.971	0.4%		1.57	🟢	268.690	1.00
QPE[13-12]	1.978	0.1%		1.57	🟢	537.121	1.00
QSVT[10]	1.057	5.6%	⚠️	0.96	🔴	2.282	1.00
XAS[2-1-9]	1.386	5.2%	⚠️	3.04	🟢	15.057	1.00
XAS[2-2-9]	3.020	7.6%	⚠️	2.82	🟢	15.048	1.00
molecular_hamiltonian[H2O-STO-3G]	3.109	1.0%		1.01		136.276	1.00
molecular_hamiltonian[NH3-STO-3G]	9.016	0.3%		1.01		334.763	1.00
sampling[24-2]	1.204	0.2%		1.10	🟢	812.612	1.00
sampling[25-2]	2.469	1.1%		1.10	🟢	1,623.308	1.00
stateprep[14-MottonenStatePreparation]	2.935	2.8%		0.82	🔴	26.530	1.08	🟢
stateprep[15-ArbitraryStatePreparation]	2.940	5.2%	⚠️	1.86	🟢	18.416	1.06	🟢
grover[18]	0.610	3.9%		2.68	🟢	13.020	1.00
QAOA_layers_scaling[19-4]	0.866	0.6%		1.33	🟢	310.716	1.00
QML_jaxjit[IQPKernelClassifier-12-10]	4.761	18.2%	⚠️	0.97	🔴	26.091	1.04	🟢
QML_jaxjit[IQPVariationalClassifier-20-10]	3.890	8.2%	⚠️	1.07	🟢	34.584	1.00
UCCSD[H2O-STO-3G]	0.915	8.3%	⚠️	0.81	🔴	136.468	1.00
UCCSD[NH3-STO-3G]	2.834	3.5%		1.07	🟢	334.777	1.00
VQE[H2O-STO-3G]	2.475	2.6%		1.05	🟢	136.148	1.00
VQE[NH3-STO-3G]	14.167	0.3%		1.05	🟢	334.743	1.00

System Information

Hardware and Software Specifications

System	Current	Reference
CPU Name	Intel(R) Xeon(R) E-2388G CPU @ 3.20GHz	Intel(R) Xeon(R) E-2388G CPU @ 3.20GHz
Architecture	X86_64	X86_64
Cache L2	4096.0 KiB	4096.0 KiB
Cache L3	16384.0 KiB	16384.0 KiB
Cache L1	256.0 KiB I + 384.0 KiB D	256.0 KiB I + 384.0 KiB D
Max MHz	0.0	0.0
Min MHz	0.0	0.0
Nominal	3287.5	3287.5
Enabled	8 cores, 16 threads	8 cores, 16 threads
Memory	62.64 GB	62.64 GB
Storage	435.52 GB on /	435.52 GB on /
OS	Linux	Linux
OS Version	Linux-5.15.0-142-generic-x86_64-with-glibc2.35	Linux-5.15.0-142-generic-x86_64-with-glibc2.35
Kernel	5.15.0-142-generic	5.15.0-142-generic
Hostname	agassi	agassi
Python Version	3.11.9	3.11.9
Python Compiler	GCC 11.4.0	GCC 11.4.0
PennyLane Version	0.44.0.dev7	0.44.0.dev7
Lightning Version	0.44.0.dev3	0.44.0.dev3
Catalyst Version	Unknown	Unknown
NumPy Version	2.3.4	2.3.4
SciPy Version	1.16.2	1.16.2
Lightning Compiler	Unknown	Unknown

Report generated on Thu, 16 Oct 2025 at 01:32:38 (UTC)

josephleekl

seems to work well! Some suggestions for things to update:

docs to mention this is on by default, and can be toggled on/off with OMP_NUM_THREADS, and might be a good idea to set to 1 if using e.g. concurrency
update changelog
in `.github/workflows/tests_lqcpu_python.yml we set it to on, with this change, we don't need to explicitly do that anymore?

maliasadi · 2025-10-16T17:40:48Z

@josephleekl Now, it's ready to review ⚡

.github/CHANGELOG.md

doc/lightning_qubit/development/avx_kernels/kernel_tuning.rst

doc/lightning_qubit/device.rst

doc/lightning_qubit/development/avx_kernels/kernel_tuning.rst

LuisAlfredoNu · 2025-10-16T18:12:22Z

One question. Doing the OMP ON as default, will not interfere with the Parallel adjoint differentiation functionality? We will have the chance of running each gate with threads and each differentiation with more threads, producing a slowdown due to thread pool flood 🤔

josephleekl

Thanks @maliasadi , LGTM! 🚀

LuisAlfredoNu

Thanks @maliasadi
🐊

Co-authored-by: Joseph Lee <[email protected]>

mlxd

Just adding the red to prevent this from accidentally merging.

We need to do more intensive eval, including across the python package ecosystem and OSs. Will do a proper review next week

mlxd

Nice one @maliasadi
Just a few queries/suggestions, but will be happy to approve pending your response.

doc/lightning_qubit/device.rst

doc/lightning_qubit/development/avx_kernels/kernel_tuning.rst

.github/CHANGELOG.md

doc/lightning_qubit/development/avx_kernels/kernel_tuning.rst

.github/workflows/wheel_linux_aarch64.yml

Co-authored-by: Lee James O'Riordan <[email protected]>

LQ_ENABLE_KERNEL_OMP=ON

544a5c6

Auto update version from '0.42.0-dev1' to '0.42.0-dev2'

b2aef63

Merge with master

c806c06

Merge branch 'master' into tune_lq_kernel_perf

c80ec76

PennyLaneAI deleted a comment from ringo-but-quantum Oct 16, 2025

Merge branch 'master' into tune_lq_kernel_perf

53cc19e

maliasadi marked this pull request as ready for review October 16, 2025 13:24

Auto update version from '0.44.0-dev3' to '0.44.0-dev4'

1c1faf7

josephleekl reviewed Oct 16, 2025

View reviewed changes

maliasadi and others added 4 commits October 16, 2025 11:40

Update changelog

a081e86

Merge with master

f881805

Update docs

05c6f27

Auto update version from '0.44.0-dev4' to '0.44.0-dev5'

b43df85

maliasadi requested review from LuisAlfredoNu and josephleekl October 16, 2025 17:41

josephleekl reviewed Oct 16, 2025

View reviewed changes

.github/CHANGELOG.md Outdated Show resolved Hide resolved

josephleekl reviewed Oct 16, 2025

View reviewed changes

doc/lightning_qubit/development/avx_kernels/kernel_tuning.rst Outdated Show resolved Hide resolved

josephleekl reviewed Oct 16, 2025

View reviewed changes

doc/lightning_qubit/device.rst Outdated Show resolved Hide resolved

josephleekl reviewed Oct 16, 2025

View reviewed changes

doc/lightning_qubit/device.rst Outdated Show resolved Hide resolved

josephleekl reviewed Oct 16, 2025

View reviewed changes

doc/lightning_qubit/development/avx_kernels/kernel_tuning.rst Outdated Show resolved Hide resolved

josephleekl approved these changes Oct 16, 2025

View reviewed changes

LuisAlfredoNu approved these changes Oct 16, 2025

View reviewed changes

Apply suggestions from code review

f14b140

Co-authored-by: Joseph Lee <[email protected]>

PennyLaneAI deleted a comment from ringo-but-quantum Oct 16, 2025

maliasadi and others added 2 commits October 16, 2025 15:41

Merge branch 'master' into tune_lq_kernel_perf

03c5ef4

Auto update version from '0.44.0-dev5' to '0.44.0-dev6'

6b71d5c

maliasadi added the do not merge Do not merge PR until this label is removed label Oct 16, 2025

mlxd requested changes Oct 17, 2025

View reviewed changes

maliasadi added 2 commits October 24, 2025 10:21

Merge with master

b916e50

Update the scope

cf3a4d5

maliasadi changed the title ~~Enable OpenMP pragmas for gate kernels~~ Enable OpenMP pragmas for gate kernels on Linux Wheels Oct 24, 2025

maliasadi added the ci:build_wheels Activate wheel building. label Oct 24, 2025

maliasadi added 2 commits October 24, 2025 10:48

trigger ci

c90f59e

Merge branch 'master' into tune_lq_kernel_perf

4285358

mlxd reviewed Dec 1, 2025

View reviewed changes

maliasadi and others added 8 commits December 4, 2025 01:06

Apply suggestions from code review

d371ffc

Co-authored-by: Lee James O'Riordan <[email protected]>

Auto update version from '0.44.0-dev14' to '0.44.0-dev17'

a2e6d48

Merge with master

41b0967

Enable Kernel OMP on Linux and MacOS wheels

63532ac

Update docs

fc1d17d

Auto update version from '0.44.0-dev26' to '0.44.0-dev28'

fe71c63

git mv kernel_tuning.rst

03cf8e1

trigger ci

808620c

maliasadi removed the do not merge Do not merge PR until this label is removed label Jan 5, 2026

Update changelog

99a1036

Enable OpenMP pragmas for gate kernels on Linux Wheels #1133

Are you sure you want to change the base?

Enable OpenMP pragmas for gate kernels on Linux Wheels #1133

Uh oh!

Conversation

maliasadi commented Apr 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Apr 17, 2025

Uh oh!

codecov bot commented Apr 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

maliasadi commented Apr 17, 2025

Uh oh!

LuisAlfredoNu commented Apr 21, 2025

Uh oh!

maliasadi commented May 28, 2025

Uh oh!

maliasadi commented Oct 15, 2025

Uh oh!

ringo-but-quantum commented Oct 16, 2025

PennyLane Benchmarks

Benchmarks Report for lightning.qubit

Weighted Average Scores

Direct execution (without Catalyst)

Aggregated Results

Time score

Memory score

Complexity score

Detailed Per-Workflow Results

Direct execution (without Catalyst)

System Information

Hardware and Software Specifications

Uh oh!

josephleekl left a comment

Choose a reason for hiding this comment

Uh oh!

maliasadi commented Oct 16, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

LuisAlfredoNu commented Oct 16, 2025

Uh oh!

josephleekl left a comment

Choose a reason for hiding this comment

Uh oh!

LuisAlfredoNu left a comment

Choose a reason for hiding this comment

Uh oh!

mlxd left a comment

Choose a reason for hiding this comment

Uh oh!

mlxd left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

maliasadi commented Apr 17, 2025 •

edited

Loading

codecov bot commented Apr 17, 2025 •

edited

Loading