feat(ci): require running benchmarks when PRs update nexus packages

# Benchmark Guard: Block PRs When Direct Dependencies Change

## Summary

When a nexus package (e.g. `terratorch`, `tokamind`, `bmfm-targets`) is updated in
`pyproject.toml`, benchmark results **may change** — the models may behave differently under
the new package version. PRs that introduce such changes must be blocked from merging until
benchmarks have been re-run and their results reviewed.

The mechanism works in three parts:

1. **Detection** — the PR CI pipeline detects whether the PR changes any direct nexus package
   dependency across the three variants (`ecosystem`, `candidate`, `product`).
2. **Blocking** — when a change is detected, the CI adds a `pending-benchmarks` label to the PR
   and fails with a clear error message. Every subsequent pipeline run also checks for the label,
   keeping the PR blocked until the label is removed.
3. **Clearance** — the benchmark status checker removes the `pending-benchmarks` label
   automatically when **all** benchmark runs for that PR have completed successfully. If any run
   failed, the label stays.

---

## Sub-Tasks

### 1 — Script: detect nexus package dependency changes

Detect whether the PR changes any direct dependency in `pyproject.toml` compared to `main`.

The script compares resolved requirements per variant using `uv export`. If the resolved
packages for a variant differ between `main` and the PR branch, that variant is flagged.

**Expected outcomes:**
- Exits `0` when no nexus-related dependency changes are detected.
- Exits `1` and prints a clear summary (which packages changed, in which variants) when changes
  are found.
- Self-contained and reusable from the PR pipeline.

**Scope note:** Detection is based on resolved requirements (via `uv export`), not raw
`pyproject.toml` text. A transitive update that reaches a nexus package also triggers the guard —
this is intentional and conservative.

---

### 2 — Script: manage the `pending-benchmarks` label

Encapsulate GitHub label management in a dedicated script with two modes:

- **`apply`** — adds the `pending-benchmarks` label to the open PR.
- **`check`** — checks whether the label is present; exits `1` (blocks the pipeline) if it is.
- **`remove`** — removes the label once benchmarks have passed.

The script receives the PR number and repository URL via environment variables so it is portable
across both the PR pipeline and the benchmark status checker contexts.

**Expected outcomes:**
- `apply`: label added, exits `0`.
- `check`: exits `1` with a clear message if the label is present; exits `0` otherwise.
- `remove`: label removed, exits `0`.
- Invalid argument: prints usage and exits `1`.

---

### 3 — Wire detection + labelling into the PR pipeline

Extend the PR pipeline's unit-test step to:

1. Run the dependency-change detection script. If changes are found, apply the
   `pending-benchmarks` label and fail.
2. Run the label-check script to block the PR if the label is already present (e.g. from a prior
   run that detected changes).

A PR cannot merge as long as the label is present, regardless of how many times the pipeline
reruns.

**Expected outcomes:**
- A PR that changes a nexus package dependency has the label applied and the pipeline fails with
  a clear, actionable message.
- A PR that already carries the label also fails, even if no new changes are detected in the
  current run.
- Existing CI checks continue to run after the guard.

---

### 4 — Remove the label automatically when benchmarks pass

Extend the benchmark status checker to remove the `pending-benchmarks` label from the PR after
all Ray benchmark jobs complete **successfully**. If any job failed, the label is left in place.

**Expected outcomes:**
- All jobs successful → `pending-benchmarks` label removed → PR can merge.
- Any job failed → label remains → PR stays blocked, author is notified.
- The existing PR comment update and notification steps are unchanged.

---

### 5 — Document the `pending-benchmarks` workflow

Add documentation to the repository explaining:

- What the `pending-benchmarks` label means and what triggers it.
- Why it exists (benchmark results may change when a package version changes).
- The end-to-end flow: dependency change detected → label applied → PR blocked → benchmarks
  triggered → if all pass, label removed → PR can merge.
- What happens if benchmarks fail: label remains, PR stays blocked.

**Expected outcomes:**
- A new document in `docs/contributing/` covers the benchmark guard workflow end-to-end.
- The document is linked from `docs/contributing/add_new_nexus_package.md`.
- The new page is added to the documentation site navigation.

Related: #148

---

## Prerequisites

- The `pending-benchmarks` label must exist in the `algorithm-nexus` GitHub repository before
  the scripts can use it. This is a one-time manual step (e.g. `gh label create
  pending-benchmarks --color …`).
- **autoupdate PRs**: the autoupdate workflow opens PRs automatically. Those PRs will also be
  checked by the pipeline — if an auto-update bumps a nexus package, the `pending-benchmarks`
  label will be applied correctly.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(ci): require running benchmarks when PRs update nexus packages #148

Benchmark Guard: Block PRs When Direct Dependencies Change

Summary

Sub-Tasks

1 — Script: detect nexus package dependency changes

2 — Script: manage the `pending-benchmarks` label

3 — Wire detection + labelling into the PR pipeline

4 — Remove the label automatically when benchmarks pass

5 — Document the `pending-benchmarks` workflow

Prerequisites

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

feat(ci): require running benchmarks when PRs update nexus packages #148

Description

Benchmark Guard: Block PRs When Direct Dependencies Change

Summary

Sub-Tasks

1 — Script: detect nexus package dependency changes

2 — Script: manage the pending-benchmarks label

3 — Wire detection + labelling into the PR pipeline

4 — Remove the label automatically when benchmarks pass

5 — Document the pending-benchmarks workflow

Prerequisites

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

2 — Script: manage the `pending-benchmarks` label

5 — Document the `pending-benchmarks` workflow