feat: error bars for BSEQ benchmark by vprusso · Pull Request #594 · unitaryfoundation/metriq-gym

vprusso · 2025-10-17T19:33:54Z

Closes: #588

BSEQ now wraps its metric in a BenchmarkScore and sources the uncertainty from the shared bootstrap helper so values and error bars travel together.
Added a reusable Monte Carlo utility for largest-component error estimates in the statistics module, matching the pattern used by WIT.
Updated the CLI workflow snippet to show the new BenchmarkScore output format for BSEQ runs.
Extended the BSEQ unit tests to cover the shared bootstrap helper.

Copilot

Pull Request Overview

This PR adds error bar support to the BSEQ benchmark by computing Monte Carlo uncertainties for the largest connected component metric. The implementation introduces a reusable bootstrap helper and updates the result structure to use BenchmarkScore containers that carry both values and uncertainties.

Key Changes:

Refactored BSEQResult to wrap the largest connected component metric in a BenchmarkScore with uncertainty
Implemented bootstrap_largest_component_stddev() for Monte Carlo error estimation using edge-level CHSH statistics
Extended BSEQ to compute and propagate per-edge variance through the bootstrap helper

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File	Description
`metriq_gym/benchmarks/bseq.py`	Updated `chsh_subgraph` to return edge statistics, integrated bootstrap uncertainty computation, changed result structure to use `BenchmarkScore`
`metriq_gym/helpers/statistics.py`	Added union-find based largest component calculator and Monte Carlo bootstrap helper for uncertainty estimation
`tests/unit/benchmarks/test_bseq.py`	New test file covering bootstrap helper edge cases and `BSEQResult` score exposure
`docs/source/cli_workflows.rst`	Updated example output to show new `BenchmarkScore` format with uncertainty values

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-10-17T19:34:43Z

metriq_gym/helpers/statistics.py

+    largest = 1
+    for idx in range(num_nodes):
+        root = find(idx)
+        if sizes[root] > largest:
+            largest = sizes[root]
+    return largest


Initialization to 1 is incorrect when there are no edges. For a graph with isolated nodes, the largest component size should be 1, but if num_nodes is 0 or edges is empty with num_nodes > 0, this will return 1 instead of the correct value. When num_nodes <= 0, the function already returns 0 at line 40-41. However, when num_nodes > 0 but edges is empty, the largest component should still be 1 (single isolated node), so the current behavior is actually correct for non-empty graphs. The issue is semantic: initialize to 0 and use max(sizes) instead of tracking during iteration to make the logic clearer.

Suggested change

largest = 1

for idx in range(num_nodes):

root = find(idx)

if sizes[root] > largest:

largest = sizes[root]

return largest

# The size of the largest component is the maximum in sizes

return max(sizes) if num_nodes > 0 else 0

Copilot · 2025-10-17T19:34:43Z

metriq_gym/helpers/statistics.py

+    for sample_idx in range(num_samples):
+        active_edges: list[tuple[int, int]] = []
+        for (u, v), (mean, std) in edges:
+            sigma = 0.0 if std is None or np.isnan(std) else float(std)


The check std is None is redundant since the type annotation indicates edge_stats values are tuple[float, float], meaning std cannot be None. If NaN values are expected to represent missing standard deviations, document this in the function docstring or consider using Optional[float] in the edge_stats type annotation to make the contract explicit.

Copilot · 2025-10-17T19:34:43Z

metriq_gym/benchmarks/bseq.py

            # The benchmark checks whether the CHSH inequality is violated (i.e., the sum of correlations exceeds 2,
            # indicating entanglement).
-            if exp_vals[idx] > 2:
+            std = float(np.sqrt(variances[idx])) if not np.isnan(variances[idx]) else float("nan")


The variance-to-stddev conversion should validate that variances[idx] >= 0 before taking the square root to prevent potential issues with floating-point precision producing negative values due to the accumulation in line 144. Consider using max(variances[idx], 0.0) before np.sqrt.

Suggested change

std = float(np.sqrt(variances[idx])) if not np.isnan(variances[idx]) else float("nan")

std = float(np.sqrt(max(variances[idx], 0.0))) if not np.isnan(variances[idx]) else float("nan")

cosenal · 2025-11-03T16:20:22Z

Do you have a friendly explanation of how the error is calculated? That is, how we go from measurement errors to the errors on the final value? It's hard to understand it from the code, and it's hard for me to review the code without understand what the calculation is supposed to do.

vprusso · 2025-11-03T21:10:57Z

Do you have a friendly explanation of how the error is calculated? That is, how we go from measurement errors to the errors on the final value? It's hard to understand it from the code, and it's hard for me to review the code without understand what the calculation is supposed to do.

That's a good point. To be quite honest, this MR was an attempt at something that I was trying to figure out myself, but I think (hope) the description (now contained in the bseq.py docstring in b4811a4) should be helpful in reviewing the MR.

vprusso added 3 commits October 12, 2025 14:46

feat: attempt at std for connected components.

4841583

chore: merge main

13207df

refactor: wrap statistics gathering for bseq in statistics module.

6f00e48

vprusso requested review from Copilot and cosenal October 17, 2025 19:33

Copilot AI reviewed Oct 17, 2025

View reviewed changes

chore: adding description for error bars for bseq

b4811a4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: error bars for BSEQ benchmark#594

feat: error bars for BSEQ benchmark#594
vprusso wants to merge 4 commits intomainfrom
588-bseq-error-bars

vprusso commented Oct 17, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Oct 17, 2025

Uh oh!

Copilot AI Oct 17, 2025

Uh oh!

Copilot AI Oct 17, 2025

Uh oh!

cosenal commented Nov 3, 2025

Uh oh!

vprusso commented Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	std = float(np.sqrt(variances[idx])) if not np.isnan(variances[idx]) else float("nan")
	std = float(np.sqrt(max(variances[idx], 0.0))) if not np.isnan(variances[idx]) else float("nan")

Conversation

vprusso commented Oct 17, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

cosenal commented Nov 3, 2025

Uh oh!

vprusso commented Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants