Digital adiabatic/QA benchmark for three QUBO families:
- Random symmetric QUBO (
x^T Q x) - MaxCut QUBO on Erdos-Renyi graphs
- MIS QUBO on Erdos-Renyi graphs
The simulator uses Qiskit Aer with:
statevectorfor smallnmatrix_product_state(MPS) for largernwhen entanglement is moderate
Use statevector runs as the reference curve for small-to-mid n, where exact
state evolution is still tractable and gives a clean baseline for algorithmic
behavior. MPS becomes viable as n grows when entanglement remains controlled,
especially on sparser instances and locality-improved graph orderings. The key
viability knobs are bond-dimension budget, truncation threshold, and topology
choices that reduce long-range couplings. For early-FT thinking, this artifact
is most useful as a workload characterization layer (instance hardness, schedule
sensitivity, convergence/runtime trends), not as a hardware fault-tolerant cost
model.
| Regime | Recommended backend | Primary knobs | Failure signal |
|---|---|---|---|
Small n, exact-reference studies |
statevector |
K, delta_t, shots |
Exponential walltime/memory growth |
Mid/Larger n, moderate entanglement |
matrix_product_state |
--mps-max-bond-dimension, --mps-truncation-threshold, graph relabeling/locality |
Bond growth, truncation drift, runtime spikes |
| Dense/high-entanglement stress cases | Start with MPS, validate spot checks in statevector where feasible | Reduce density, increase bond dimension conservatively, tighten truncation checks | MPS loses advantage or quality degrades versus reference |
- Trotterized adiabatic evolution with
delta_tand sweep over step countK - Convergence metric
steps_to_opt(firstKwhere best-so-far reaches reference energy) - Three benchmark families: Random / MaxCut / MIS
- Success-probability curves vs time (
success_prob.png) - Modern Qiskit Aer primitives path for sampling (
SamplerV2.from_backend) - Optional modern diagnostics via
EstimatorV2expectation-energy curves - Graph relabeling (BFS from highest-degree node) for MaxCut/MIS to improve linear edge locality for MPS
- Parameterized transpile-template reuse across instances with modes:
--transpile-cache-mode support(keyed by nonzero term pattern)--transpile-cache-mode full(keyed byn+ schedule, better reuse for heterogeneous instances)- optional auto-disable fallback when cache hit rate stays too low
- Statistical testing modes:
--stats-method mw(one-sided Mann-Whitney U, random < comparator)--stats-method perm(one-sided permutation test on median difference)
- Multi-
nscan mode via--n-listwith aggregatedscan_summary.csv - Separate PRR-style analog prototype via
prr_local_detuning_opt.py - Cross-algorithm comparator via
compare_qa_sa_prr.py(QA/SA/PRR on shared instances)
make install
make smoke- GitHub Actions lint workflows:
.github/workflows/lint-python.ymland.github/workflows/lint-markdown.yml - Benchmark validation workflow:
.github/workflows/validation.ymlruns compile, unit, and smoke checks on pull requests,main, and merge-queue checks - Code scanning: GitHub CodeQL default setup (repository setting)
- Dependabot updates for GitHub Actions and
pip:.github/dependabot.yml - Dependabot PR auto-merge helper for Actions updates:
.github/workflows/dependabot-auto-merge.yml - Ownership policy:
.github/CODEOWNERS - Security reporting policy:
SECURITY.md
git clone git@github.com:M-Gage-Plott42/QUBO_QA.git
cd QUBO_QA
make install
make smoke
make smoke-perm
make scan-smoke- Use
python3/.venv/bin/pythonon Linux/WSL. - Keep provenance runs under
artifacts/...when you intend to commit outputs. - Keep exploratory or failed runs under
diagnostics_local/...(untracked by policy).
Small exact sanity check:
.venv/bin/python qa_adiabatic_steps_bench.py -n 6 --instances 10 --t-max 5 --shots 64 --aer-method statevector --opt-ref exactWeighted MaxCut/MIS sample:
.venv/bin/python qa_adiabatic_steps_bench.py -n 8 --instances 20 --t-max 8 --shots 128 \
--aer-method statevector --opt-ref exact \
--maxcut-weight-low 1 --maxcut-weight-high 5 \
--mis-node-weight-low 1 --mis-node-weight-high 4 --mis-lambda 5Permutation-statistics run:
.venv/bin/python qa_adiabatic_steps_bench.py -n 6 --instances 50 --t-max 10 --shots 128 \
--aer-method statevector --opt-ref exact \
--stats-method perm --perm-iterations 10000Side-by-side SA baseline comparison:
.venv/bin/python qa_adiabatic_steps_bench.py -n 8 --instances 20 --t-max 8 --shots 128 \
--aer-method statevector --opt-ref exact \
--classical-baseline sa --baseline-sa-reads 256 --baseline-sa-sweeps 2000n-scan run:
.venv/bin/python qa_adiabatic_steps_bench.py --n-list 4,5,6,7,8 \
--instances 50 --t-max 10 --shots 128 \
--aer-method statevector --opt-ref exact --outdir qa_scanPush higher qubits with MPS:
.venv/bin/python qa_adiabatic_steps_bench.py -n 30 --instances 50 --t-max 10 --shots 64 \
--aer-method matrix_product_state \
--mps-max-bond-dimension 128 \
--mps-truncation-threshold 1e-10Try n=42 (sparse random QUBO recommended):
.venv/bin/python qa_adiabatic_steps_bench.py -n 42 --instances 30 --t-max 8 --shots 32 \
--aer-method matrix_product_state \
--random-density 0.15 \
--mps-max-bond-dimension 64 \
--mps-truncation-threshold 1e-8Enable estimator diagnostics (writes expectation_energy.png):
.venv/bin/python qa_adiabatic_steps_bench.py -n 6 --instances 10 --t-max 5 --shots 64 \
--aer-method statevector --opt-ref exact --estimator-diagnosticsPRR-style analog local-detuning prototype (BFGS/NM/BFGS):
.venv/bin/python prr_local_detuning_opt.py --problem maxcut -n 8 --graph-p 0.3 \
--maxcut-weight-low 1 --maxcut-weight-high 5 \
--total-time 3.5 --segments 8 --omega-max 5.0 --g-min -8 --g-max 4 --g-start -3 \
--maxiter-bfgs 120 --maxiter-nm 180 \
--outdir diagnostics_local/2026-02-15/prr_maxcut_n8QA/SA/PRR comparison run (all three families, shared instance bank):
.venv/bin/python compare_qa_sa_prr.py -n 8 --instances 5 --t-max 6 --delta-t 0.25 --shots 64 \
--aer-method statevector \
--sa-reads 128 --sa-sweep-checkpoints 64,256,1024 \
--prr-total-time 3.5 --prr-segments 8 \
--outdir diagnostics_local/2026-02-16/compare_n8_i5Default output directory: qa_out/
results.csv(per-instance metrics)summary.json(aggregate metrics + test results, includingeasy_caserates andrun_timing)convergence_energy.png(median best-so-far energy vs time)success_prob.png(cumulative success probability vs time, "reached by timet")success_prob_instantaneous.png(instantaneous success probability at timet)expectation_energy.png(optional, only with--estimator-diagnostics)steps_boxplot.png(steps-to-opt distribution)- optional histogram outputs with
--plot-histograms:hist_final_energy.pnghist_final_gap_to_opt.pnghist_approx_ratio.png
instance_bank.json(shared deterministic instance manifest + digests)comparison_results.csv(per-instance final metrics for QA/SA/PRR)comparison_curves.csv(per-instance convergence traces)comparison_summary.json(aggregate metrics, includingRand1-R)- plots (unless
--no-plots):convergence_ratio_compare.pngsuccess_prob_compare.pnghist_final_ratio.pnghist_final_gap.pnghist_runtime_seconds.png
If --outdir qa_scan and --n-list 4,5,6, outputs are:
qa_scan/n_4/,qa_scan/n_5/,qa_scan/n_6/(each with per-nresults and summary)qa_scan/scan_summary.csv(one row pern, medians, easy-case rates, p-values, Holm-adjusted p-values, Cliff's deltas, and scan-stop audit fields)
- Scratch/regression outputs remain local-only:
qa_out*/,qa_scan*/,out/, anddiagnostics_local/are git-ignored.
- Provenance outputs should be written under
artifacts/and committed when needed. - For reproducible provenance, always set an explicit run directory:
- single-
n:artifacts/runs/<YYYY-MM-DD>/<run_tag>/ --n-list:artifacts/scans/<YYYY-MM-DD>/<run_tag>/
- single-
- Temporary diagnostic/audit runs (for failed, pre-patch, or exploratory runs) should go under:
diagnostics_local/<YYYY-MM-DD>/<run_tag>/- keep these untracked by default.
- Recommended provenance files to commit:
summary.jsonscan_summary.csv(for scans)- plots (
convergence_energy.png,success_prob.png,steps_boxplot.png, optionalexpectation_energy.png) results.csvwhen file size is reasonable for the repo.
- This is digital adiabatic simulation (Trotterized circuit evolution), not analog hardware quantum annealing.
- A separate analog prototype (
prr_local_detuning_opt.py) is available for PRR-style local-detuning pulse optimization with a BFGS/NM/BFGS sequence. --opt-ref exactuses brute-force exact solving (dimod.ExactSolver) and is intended for smalln.--opt-ref qa_bestuses the best observed final energy as the reference optimum.- Weighted controls:
- MaxCut edge weights:
--maxcut-weight-low,--maxcut-weight-high - MIS node weights:
--mis-node-weight-low,--mis-node-weight-high
- MaxCut edge weights:
- MIS penalty note: for the weighted MIS QUBO form (
-sum w_i x_i + lambda * sum x_i x_j), use--mis-lambda > max(1, --mis-node-weight-high)to preserve encoding semantics. - Input guardrails:
--random-densityand--graph-pmust be in[0,1],--random-low <= --random-high,--maxcut-weight-low <= --maxcut-weight-high,--mis-node-weight-low <= --mis-node-weight-high.
- Ising energy evaluation is aligned with
dimodconversion conventions (x = (s + 1)/2, so measured bit0 -> s=-1,1 -> s=+1). - Statevector scaling is exponential in
n. - MPS scaling depends on entanglement growth and bond-dimension/truncation settings.
- For dense random couplings, MPS bond growth can still make large
nexpensive. - Transpile cache can be controlled with
--transpile-cache-mode {support,full}and--no-transpile-cache. - Low-yield cache runs can auto-disable caching via hit-rate fallback (
--cache-autodisable-*,--no-cache-autodisable). --estimator-diagnosticsenables EstimatorV2 expectation-value tracking and adds estimator keys tosummary.json.summary.jsonincludes run timing metadata underrun_timing:run_started_utcrun_finished_utcwalltime_seconds_totalwalltime_seconds_by_family
summary.jsonincludes paper-aligned metrics:approximation_ratio(per-family mean/median att_max, minimization ratio in[0,1])- optional
hardness_proxywhen--hardness-proxy exactis enabled
summary.jsonincludes optional classical baseline comparison metadata underclassical_baselinewhen--classical-baseline sais enabled.- SA execution prefers
dwave-nealwhen installed; baseline metadata recordsclassical_baseline.engine. compare_qa_sa_prr.pyexposes paper-style quality metrics directly:- approximation ratio
R, - approximation-ratio error
1 - R.
- approximation ratio
- Success curves are reported in two variants:
- cumulative-by-time (
success_prob.png, computed from best-so-far trajectories), - instantaneous-at-time (
success_prob_instantaneous.png, computed from per-step sampled energies).
- cumulative-by-time (
- Optional
--n-listearly-stop is available when both are provided:--scan-stop-p-holm-thresholdand--scan-stop-easy-case-threshold- optional tuning via
--scan-stop-easy-case-op {le,ge}and--scan-stop-min-n-evals - when triggered, scanning stops early and
scan_summary.csvrecords trigger metadata.
- Recommended starting presets (tuned on quick statevector scans, 2026-02-15):
- Aggressive early-stop:
--scan-stop-p-holm-threshold 0.9 --scan-stop-easy-case-threshold 0.9 --scan-stop-easy-case-op ge --scan-stop-min-n-evals 2
- Conservative early-stop:
--scan-stop-p-holm-threshold 0.95 --scan-stop-easy-case-threshold 0.7 --scan-stop-easy-case-op ge --scan-stop-min-n-evals 3
- Aggressive early-stop:
- Resource-estimation handoff: LANL-style downstream analysis can directly reuse
scan_summary.csv,run_timing, and success/convergence curves as workload inputs; hardware/error-model FT resource estimation itself is intentionally out of scope for this repository.
After any code patch to benchmark logic, CLI, caching, stats, or outputs, run and verify:
.venv/bin/python -m py_compile qa_adiabatic_steps_bench.py.venv/bin/python -m unittest discover -s tests -p 'test_*.py' -vmake smokemake smoke-permmake scan-smokemake compare-smoke- Targeted subsystem audit for the changed area (for example cache on/off artifact audit, estimator diagnostics smoke, or stats sanity check)
Only commit after these checks pass and audit findings are reviewed.
Validated against M-Gage-Plott42/qiskit-v2-guide (Qiskit 2.3.x patterns):
- Hamiltonian-to-gate mapping is consistent:
- driver term
-sum Xis implemented asrx(-2 t), - problem terms are implemented as
rz(2 t h)andrzz(2 t J).
- driver term
- Measurement/energy path is consistent:
- Qiskit counts are treated as MSB-first classical strings,
- keys are reversed to qubit-0-first before Ising energy evaluation.
- Aer backend usage remains valid for Qiskit 2.x:
- execution now uses
SamplerV2.from_backend(AerSimulator(...))on transpiled circuits, - MPS options used by this script are recognized by Aer (
qiskit-aer 0.17.2), - auto-switch to MPS at
n >= --mps-auto-thresholdis functioning.
- execution now uses
- CLI and output contracts remain unchanged after this modernization.
make smokefor quick baseline checkmake smoke-permfor permutation-test quick checkmake scan-smokefor quick--n-listcheckmake compare-smokefor quick QA/SA/PRR comparison check