QUBO QA Benchmark (Qiskit Aer, CPU)

Digital adiabatic/QA benchmark for three QUBO families:

Random symmetric QUBO (x^T Q x)
MaxCut QUBO on Erdos-Renyi graphs
MIS QUBO on Erdos-Renyi graphs

The simulator uses Qiskit Aer with:

statevector for small n
matrix_product_state (MPS) for larger n when entanglement is moderate

Interpretation for Reviewers

Use statevector runs as the reference curve for small-to-mid n, where exact state evolution is still tractable and gives a clean baseline for algorithmic behavior. MPS becomes viable as n grows when entanglement remains controlled, especially on sparser instances and locality-improved graph orderings. The key viability knobs are bond-dimension budget, truncation threshold, and topology choices that reduce long-range couplings. For early-FT thinking, this artifact is most useful as a workload characterization layer (instance hardness, schedule sensitivity, convergence/runtime trends), not as a hardware fault-tolerant cost model.

Backend Selection Guide

Regime	Recommended backend	Primary knobs	Failure signal
Small `n`, exact-reference studies	`statevector`	`K`, `delta_t`, `shots`	Exponential walltime/memory growth
Mid/Larger `n`, moderate entanglement	`matrix_product_state`	`--mps-max-bond-dimension`, `--mps-truncation-threshold`, graph relabeling/locality	Bond growth, truncation drift, runtime spikes
Dense/high-entanglement stress cases	Start with MPS, validate spot checks in statevector where feasible	Reduce density, increase bond dimension conservatively, tighten truncation checks	MPS loses advantage or quality degrades versus reference

Current Features

Trotterized adiabatic evolution with delta_t and sweep over step count K
Convergence metric steps_to_opt (first K where best-so-far reaches reference energy)
Three benchmark families: Random / MaxCut / MIS
Success-probability curves vs time (success_prob.png)
Modern Qiskit Aer primitives path for sampling (SamplerV2.from_backend)
Optional modern diagnostics via EstimatorV2 expectation-energy curves
Graph relabeling (BFS from highest-degree node) for MaxCut/MIS to improve linear edge locality for MPS
Parameterized transpile-template reuse across instances with modes:
- --transpile-cache-mode support (keyed by nonzero term pattern)
- --transpile-cache-mode full (keyed by n + schedule, better reuse for heterogeneous instances)
- optional auto-disable fallback when cache hit rate stays too low
Statistical testing modes:
- --stats-method mw (one-sided Mann-Whitney U, random < comparator)
- --stats-method perm (one-sided permutation test on median difference)
Multi-n scan mode via --n-list with aggregated scan_summary.csv
Separate PRR-style analog prototype via prr_local_detuning_opt.py
Cross-algorithm comparator via compare_qa_sa_prr.py (QA/SA/PRR on shared instances)

Quickstart

make install
make smoke

Repository Guardrails

GitHub Actions lint workflows: .github/workflows/lint-python.yml and .github/workflows/lint-markdown.yml
Benchmark validation workflow: .github/workflows/validation.yml runs compile, unit, and smoke checks on pull requests, main, and merge-queue checks
Code scanning: GitHub CodeQL default setup (repository setting)
Dependabot updates for GitHub Actions and pip: .github/dependabot.yml
Dependabot PR auto-merge helper for Actions updates: .github/workflows/dependabot-auto-merge.yml
Ownership policy: .github/CODEOWNERS
Security reporting policy: SECURITY.md

New Workstation Checklist

git clone git@github.com:M-Gage-Plott42/QUBO_QA.git
cd QUBO_QA
make install
make smoke
make smoke-perm
make scan-smoke

Use python3/.venv/bin/python on Linux/WSL.
Keep provenance runs under artifacts/... when you intend to commit outputs.
Keep exploratory or failed runs under diagnostics_local/... (untracked by policy).

Run Examples

Small exact sanity check:

.venv/bin/python qa_adiabatic_steps_bench.py -n 6 --instances 10 --t-max 5 --shots 64 --aer-method statevector --opt-ref exact

Weighted MaxCut/MIS sample:

.venv/bin/python qa_adiabatic_steps_bench.py -n 8 --instances 20 --t-max 8 --shots 128 \
  --aer-method statevector --opt-ref exact \
  --maxcut-weight-low 1 --maxcut-weight-high 5 \
  --mis-node-weight-low 1 --mis-node-weight-high 4 --mis-lambda 5

Permutation-statistics run:

.venv/bin/python qa_adiabatic_steps_bench.py -n 6 --instances 50 --t-max 10 --shots 128 \
  --aer-method statevector --opt-ref exact \
  --stats-method perm --perm-iterations 10000

Side-by-side SA baseline comparison:

.venv/bin/python qa_adiabatic_steps_bench.py -n 8 --instances 20 --t-max 8 --shots 128 \
  --aer-method statevector --opt-ref exact \
  --classical-baseline sa --baseline-sa-reads 256 --baseline-sa-sweeps 2000

n-scan run:

.venv/bin/python qa_adiabatic_steps_bench.py --n-list 4,5,6,7,8 \
  --instances 50 --t-max 10 --shots 128 \
  --aer-method statevector --opt-ref exact --outdir qa_scan

Push higher qubits with MPS:

.venv/bin/python qa_adiabatic_steps_bench.py -n 30 --instances 50 --t-max 10 --shots 64 \
  --aer-method matrix_product_state \
  --mps-max-bond-dimension 128 \
  --mps-truncation-threshold 1e-10

Try n=42 (sparse random QUBO recommended):

.venv/bin/python qa_adiabatic_steps_bench.py -n 42 --instances 30 --t-max 8 --shots 32 \
  --aer-method matrix_product_state \
  --random-density 0.15 \
  --mps-max-bond-dimension 64 \
  --mps-truncation-threshold 1e-8

Enable estimator diagnostics (writes expectation_energy.png):

.venv/bin/python qa_adiabatic_steps_bench.py -n 6 --instances 10 --t-max 5 --shots 64 \
  --aer-method statevector --opt-ref exact --estimator-diagnostics

PRR-style analog local-detuning prototype (BFGS/NM/BFGS):

.venv/bin/python prr_local_detuning_opt.py --problem maxcut -n 8 --graph-p 0.3 \
  --maxcut-weight-low 1 --maxcut-weight-high 5 \
  --total-time 3.5 --segments 8 --omega-max 5.0 --g-min -8 --g-max 4 --g-start -3 \
  --maxiter-bfgs 120 --maxiter-nm 180 \
  --outdir diagnostics_local/2026-02-15/prr_maxcut_n8

QA/SA/PRR comparison run (all three families, shared instance bank):

.venv/bin/python compare_qa_sa_prr.py -n 8 --instances 5 --t-max 6 --delta-t 0.25 --shots 64 \
  --aer-method statevector \
  --sa-reads 128 --sa-sweep-checkpoints 64,256,1024 \
  --prr-total-time 3.5 --prr-segments 8 \
  --outdir diagnostics_local/2026-02-16/compare_n8_i5

Outputs

Single-`n` mode

Default output directory: qa_out/

results.csv (per-instance metrics)
summary.json (aggregate metrics + test results, including easy_case rates and run_timing)
convergence_energy.png (median best-so-far energy vs time)
success_prob.png (cumulative success probability vs time, "reached by time t")
success_prob_instantaneous.png (instantaneous success probability at time t)
expectation_energy.png (optional, only with --estimator-diagnostics)
steps_boxplot.png (steps-to-opt distribution)
optional histogram outputs with --plot-histograms:
- hist_final_energy.png
- hist_final_gap_to_opt.png
- hist_approx_ratio.png

QA/SA/PRR comparison mode (`compare_qa_sa_prr.py`)

instance_bank.json (shared deterministic instance manifest + digests)
comparison_results.csv (per-instance final metrics for QA/SA/PRR)
comparison_curves.csv (per-instance convergence traces)
comparison_summary.json (aggregate metrics, including R and 1-R)
plots (unless --no-plots):
- convergence_ratio_compare.png
- success_prob_compare.png
- hist_final_ratio.png
- hist_final_gap.png
- hist_runtime_seconds.png

`--n-list` scan mode

If --outdir qa_scan and --n-list 4,5,6, outputs are:

qa_scan/n_4/, qa_scan/n_5/, qa_scan/n_6/ (each with per-n results and summary)
qa_scan/scan_summary.csv (one row per n, medians, easy-case rates, p-values, Holm-adjusted p-values, Cliff's deltas, and scan-stop audit fields)

Output Policy

Scratch/regression outputs remain local-only:
- qa_out*/, qa_scan*/, out/, and diagnostics_local/ are git-ignored.
Provenance outputs should be written under artifacts/ and committed when needed.
For reproducible provenance, always set an explicit run directory:
- single-n: artifacts/runs/<YYYY-MM-DD>/<run_tag>/
- --n-list: artifacts/scans/<YYYY-MM-DD>/<run_tag>/
Temporary diagnostic/audit runs (for failed, pre-patch, or exploratory runs) should go under:
- diagnostics_local/<YYYY-MM-DD>/<run_tag>/
- keep these untracked by default.
Recommended provenance files to commit:
- summary.json
- scan_summary.csv (for scans)
- plots (convergence_energy.png, success_prob.png, steps_boxplot.png, optional expectation_energy.png)
- results.csv when file size is reasonable for the repo.

Method Notes

This is digital adiabatic simulation (Trotterized circuit evolution), not analog hardware quantum annealing.
A separate analog prototype (prr_local_detuning_opt.py) is available for PRR-style local-detuning pulse optimization with a BFGS/NM/BFGS sequence.
--opt-ref exact uses brute-force exact solving (dimod.ExactSolver) and is intended for small n.
--opt-ref qa_best uses the best observed final energy as the reference optimum.
Weighted controls:
- MaxCut edge weights: --maxcut-weight-low, --maxcut-weight-high
- MIS node weights: --mis-node-weight-low, --mis-node-weight-high
MIS penalty note: for the weighted MIS QUBO form (-sum w_i x_i + lambda * sum x_i x_j), use --mis-lambda > max(1, --mis-node-weight-high) to preserve encoding semantics.
Input guardrails:
- --random-density and --graph-p must be in [0,1],
- --random-low <= --random-high,
- --maxcut-weight-low <= --maxcut-weight-high,
- --mis-node-weight-low <= --mis-node-weight-high.
Ising energy evaluation is aligned with dimod conversion conventions (x = (s + 1)/2, so measured bit 0 -> s=-1, 1 -> s=+1).
Statevector scaling is exponential in n.
MPS scaling depends on entanglement growth and bond-dimension/truncation settings.
For dense random couplings, MPS bond growth can still make large n expensive.
Transpile cache can be controlled with --transpile-cache-mode {support,full} and --no-transpile-cache.
Low-yield cache runs can auto-disable caching via hit-rate fallback (--cache-autodisable-*, --no-cache-autodisable).
--estimator-diagnostics enables EstimatorV2 expectation-value tracking and adds estimator keys to summary.json.
summary.json includes run timing metadata under run_timing:
- run_started_utc
- run_finished_utc
- walltime_seconds_total
- walltime_seconds_by_family
summary.json includes paper-aligned metrics:
- approximation_ratio (per-family mean/median at t_max, minimization ratio in [0,1])
- optional hardness_proxy when --hardness-proxy exact is enabled
summary.json includes optional classical baseline comparison metadata under classical_baseline when --classical-baseline sa is enabled.
SA execution prefers dwave-neal when installed; baseline metadata records classical_baseline.engine.
compare_qa_sa_prr.py exposes paper-style quality metrics directly:
- approximation ratio R,
- approximation-ratio error 1 - R.
Success curves are reported in two variants:
- cumulative-by-time (success_prob.png, computed from best-so-far trajectories),
- instantaneous-at-time (success_prob_instantaneous.png, computed from per-step sampled energies).
Optional --n-list early-stop is available when both are provided:
- --scan-stop-p-holm-threshold and --scan-stop-easy-case-threshold
- optional tuning via --scan-stop-easy-case-op {le,ge} and --scan-stop-min-n-evals
- when triggered, scanning stops early and scan_summary.csv records trigger metadata.
Recommended starting presets (tuned on quick statevector scans, 2026-02-15):
- Aggressive early-stop:
  - --scan-stop-p-holm-threshold 0.9 --scan-stop-easy-case-threshold 0.9 --scan-stop-easy-case-op ge --scan-stop-min-n-evals 2
- Conservative early-stop:
  - --scan-stop-p-holm-threshold 0.95 --scan-stop-easy-case-threshold 0.7 --scan-stop-easy-case-op ge --scan-stop-min-n-evals 3
Resource-estimation handoff: LANL-style downstream analysis can directly reuse scan_summary.csv, run_timing, and success/convergence curves as workload inputs; hardware/error-model FT resource estimation itself is intentionally out of scope for this repository.

Patch Compliance Policy

After any code patch to benchmark logic, CLI, caching, stats, or outputs, run and verify:

.venv/bin/python -m py_compile qa_adiabatic_steps_bench.py
.venv/bin/python -m unittest discover -s tests -p 'test_*.py' -v
make smoke
make smoke-perm
make scan-smoke
make compare-smoke
Targeted subsystem audit for the changed area (for example cache on/off artifact audit, estimator diagnostics smoke, or stats sanity check)

Only commit after these checks pass and audit findings are reviewed.

Qiskit v2 Alignment Notes (2026-02-15)

Validated against M-Gage-Plott42/qiskit-v2-guide (Qiskit 2.3.x patterns):

Hamiltonian-to-gate mapping is consistent:
- driver term -sum X is implemented as rx(-2 t),
- problem terms are implemented as rz(2 t h) and rzz(2 t J).
Measurement/energy path is consistent:
- Qiskit counts are treated as MSB-first classical strings,
- keys are reversed to qubit-0-first before Ising energy evaluation.
Aer backend usage remains valid for Qiskit 2.x:
- execution now uses SamplerV2.from_backend(AerSimulator(...)) on transpiled circuits,
- MPS options used by this script are recognized by Aer (qiskit-aer 0.17.2),
- auto-switch to MPS at n >= --mps-auto-threshold is functioning.
CLI and output contracts remain unchanged after this modernization.

Makefile Shortcuts

make smoke for quick baseline check
make smoke-perm for permutation-test quick check
make scan-smoke for quick --n-list check
make compare-smoke for quick QA/SA/PRR comparison check

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
.github		.github
artifacts		artifacts
tests		tests
.gitignore		.gitignore
.markdownlint-cli2.jsonc		.markdownlint-cli2.jsonc
AGENTS.md		AGENTS.md
LICENSE		LICENSE
Makefile		Makefile
PROJECT_STATUS.md		PROJECT_STATUS.md
README.md		README.md
SECURITY.md		SECURITY.md
compare_qa_sa_prr.py		compare_qa_sa_prr.py
comparison_common.py		comparison_common.py
prr_local_detuning_opt.py		prr_local_detuning_opt.py
qa_adiabatic_steps_bench.py		qa_adiabatic_steps_bench.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

QUBO QA Benchmark (Qiskit Aer, CPU)

Interpretation for Reviewers

Backend Selection Guide

Current Features

Quickstart

Repository Guardrails

New Workstation Checklist

Run Examples

Outputs

Single-`n` mode

QA/SA/PRR comparison mode (`compare_qa_sa_prr.py`)

`--n-list` scan mode

Output Policy

Method Notes

Patch Compliance Policy

Qiskit v2 Alignment Notes (2026-02-15)

Makefile Shortcuts

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

QUBO QA Benchmark (Qiskit Aer, CPU)

Interpretation for Reviewers

Backend Selection Guide

Current Features

Quickstart

Repository Guardrails

New Workstation Checklist

Run Examples

Outputs

Single-n mode

QA/SA/PRR comparison mode (compare_qa_sa_prr.py)

--n-list scan mode

Output Policy

Method Notes

Patch Compliance Policy

Qiskit v2 Alignment Notes (2026-02-15)

Makefile Shortcuts

About

Topics

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Single-`n` mode

QA/SA/PRR comparison mode (`compare_qa_sa_prr.py`)

`--n-list` scan mode

Packages