Skip to content

perf: VQE ablation tests #1416

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 15 commits into
base: main
Choose a base branch
from
Open

perf: VQE ablation tests #1416

wants to merge 15 commits into from

Conversation

mofeing
Copy link
Collaborator

@mofeing mofeing commented Jun 23, 2025

No description provided.

@mofeing mofeing force-pushed the ss/ablation-tests/vqe branch from 66efa2c to 216e3bf Compare June 24, 2025 11:48
@mofeing
Copy link
Collaborator Author

mofeing commented Jun 24, 2025

Captura de pantalla 2025-06-24 a las 22 42 04

@wsmoses could there be a pass in :all mode that reverts the performance gain in the passes of :before_enzyme and :after_enzyme?

this benchmark is mostly only dot_general and involved transposes.

@wsmoses
Copy link
Member

wsmoses commented Jun 24, 2025

Yeah potentially, definitely merits investigation.

cc @avik-pal esp cuz fft related issue is similar

@avik-pal
Copy link
Collaborator

avik-pal commented Jun 24, 2025 via email

@mofeing
Copy link
Collaborator Author

mofeing commented Jun 24, 2025

Here is the MLIR generated for each case: vqe-mlir.zip

I'm running into problems with xprof. I'm doing the following:

Reactant.with_profiler(@__DIR__; trace_host=true, trace_device=true) do
    ∇f_xla = @compile compile_options = Reactant.DefaultXLACompileOptions(; sync=true) ∇expectation(
        params_re, observable_re, coef_re
    )
    for _ in 1:100
        ∇f_xla(params_re, observable_re, coef_re)
    end
end

and I get the following message and no data on my traces:
Captura de pantalla 2025-06-25 a las 1 56 58

@avik-pal
Copy link
Collaborator

The gradients looks quite strange. Did some loop or vector op get scalarized?

@mofeing
Copy link
Collaborator Author

mofeing commented Jun 25, 2025

The gradients looks quite strange. Did some loop or vector op get scalarized?

Ahh that's due to the fact that I was zero initializing the parameters and just using one Hamiltonian term; i.e. the ket and the bra are then effectively perpendicular and thus, the primal value and the gradients are zero.

Random initialization shows some better but small gradients. Most probably I would need to add more Hamiltonian terms but right now, we are just sequentially running the gradient function over all the Hamiltonian terms (some parallelization using MPI) and then summing. Batching over the Hamiltonian terms requires some more work but this benchmark (which you can imagine as just running runs 1 epoch, 1 sample) is a good reflection of what we do.

tldr: Gradients being zero are / were a numerical issue, not a bug introduced by a pass.

@mofeing mofeing force-pushed the ss/ablation-tests/vqe branch from 2a1adc3 to aa11882 Compare July 22, 2025 10:58
@mofeing
Copy link
Collaborator Author

mofeing commented Jul 22, 2025

There is a bug introduced in EnzymeAD/Enzyme-JAX#1121 breaks the rev diff rule of stablehlo.dynamic_update_slice and thus, this benchmark is currently fixed to Reactant v0.2.138


For 128 hamiltonian terms, 40 qubits and 6 layers of the EfficientSU2 ansatz, I'm getting the following results:

image

The same for 50 qubits seems to have problems:

image

It could also be noise as I've tested in my laptop and on CPU, and as the benchmarks take longer, less samples are taken.

@mofeing mofeing marked this pull request as ready for review July 22, 2025 11:47
@mofeing mofeing requested a review from wsmoses July 22, 2025 11:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants