Skip to content

fix(baseline): file_pattern glob matching + jq scoping bugs#180

Merged
hyperpolymath merged 1 commit into
mainfrom
feat/hypatia-ignore-prefix-match
May 26, 2026
Merged

fix(baseline): file_pattern glob matching + jq scoping bugs#180
hyperpolymath merged 1 commit into
mainfrom
feat/hypatia-ignore-prefix-match

Conversation

@hyperpolymath
Copy link
Copy Markdown
Owner

Summary

Two latent jq scoping bugs in baseline file_pattern matching:

  1. f.file re-evaluates the identity inside map(select(...)). rebinds to each baseline entry, so f.file returned the baseline entry's .file field instead of the finding's. Fix: f as $finding | ... before entering the map.
  2. .file_pattern inside test(arg)test's argument is evaluated in the input's scope (the file string), so .file_pattern errored with "Cannot index string", silently swallowed by select() and treated truthy. This made file_pattern entries always-match. Fix: (.file_pattern? // null) as $pat | ... capture before test().

Both scripts/apply-baseline.sh and the inline in_baseline() helper in governance-reusable.yml had these bugs. Both now mirror the same corrected **.* / *[^/]* glob → regex translation.

Adds scripts/tests/apply-baseline-test.sh — six regression cases that pin both bugs so they can't recur (exact match, glob match, over-match guard, single-* segment, slash-crossing, empty baseline).

Why this matters

The absolute-zero language-demo repo carries ~30 legitimate banned-language example files under examples/ (Java, Kotlin, Swift, Dart, Cobol, Erlang, Fortran, Lisp, …). Its main branch currently fails governance on every push because .hypatia-baseline.json file_pattern doesn't actually exempt anything. Same blocker: maa-framework #69 (Dependabot rust-toolchain bump) — vendored absolute-zero/examples/{java,kotlin}/*.{java,kt} files trip the language-policy step.

After this lands, both can use a single file_pattern: "examples/**" entry instead of per-file .hypatia-ignore enumeration.

Test plan

  • scripts/tests/apply-baseline-test.sh — 6/6 pass locally
  • End-to-end simulation of language-policy step against a fixture with examples/ baseline-exempted (Java enforce fails on src/RealCode.java, Swift enforce passes on examples/swift/*)
  • CI on this PR
  • Downstream: file follow-up PRs to absolute-zero (.hypatia-baseline.json with file_pattern) and maa-framework

🤖 Generated with Claude Code

…ine + apply-baseline.sh)

The Hypatia baseline schema documents `file_pattern` as the canonical
mechanism for exempting subtrees from governance gates (an alternative
to per-file `.hypatia-ignore` enumeration), and `apply-baseline.sh`
ostensibly implemented it. But both implementations had latent jq
scoping bugs that caused file_pattern to silently always-match instead
of glob-match.

Two bugs fixed:

1. `f.file` inside `select(...)` — `f` is a jq function-parameter filter,
   and inside the map(select(...)) over the baseline, `.` rebinds to
   each baseline entry, so `f.file` re-evaluated the identity against
   the baseline entry rather than capturing the finding. Fix: bind
   `f as $finding` before entering the map.

2. `.file_pattern` inside `test(arg)` — `test`'s argument is evaluated
   in the input's scope, which by then is the file string (the test
   input), not the baseline entry. `.file_pattern` errored with "Cannot
   index string", silently swallowed inside `select()` and treated as
   truthy. Fix: capture `(.file_pattern? // null) as $pat` before
   entering test().

The `in_baseline()` helper inline in `governance-reusable.yml` was
flagged in a comment as "intentionally NOT implementing file_pattern
glob" — but the right place to add it is in_baseline itself (the
caller of the language-policy step), since enforcement happens via the
inline `find` + per-file lookup, not via apply-baseline.sh. Both code
paths now mirror the same corrected glob → regex translation:
`**` → `.*` (cross-directory), `*` → `[^/]*` (single segment).

Adds `scripts/tests/apply-baseline-test.sh` with six regression cases
(exact match, glob match, over-match guard, single-* slash crossing,
empty baseline) — pins both bugs so they can't recur.

Foundational fix for the absolute-zero language-demo repo (~30+ legitimate
banned-language example files across `examples/`) and any repo that
vendors such subtrees (currently maa-framework's #69 Dependabot PR is
blocked by 4 such files). Downstream consumers can replace per-file
`.hypatia-ignore` enumeration with one `file_pattern: "examples/**"`
entry in `.hypatia-baseline.json` once this lands.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

🔍 Hypatia Security Scan

Findings: 118 issues detected

Severity Count
🔴 Critical 64
🟠 High 43
🟡 Medium 11

⚠️ Action Required: Critical security issues found!

View findings
[
  {
    "reason": "Action hyperpolymath/standards/.github/workflows/deno-ci-reusable.yml@main needs attention",
    "type": "unpinned_action",
    "file": "deno-ci-reusable.yml",
    "action": "pin_sha",
    "rule_module": "workflow_audit",
    "severity": "high"
  },
  {
    "reason": "Action hyperpolymath/standards/.github/workflows/governance-reusable.yml@main needs attention",
    "type": "unpinned_action",
    "file": "governance-reusable.yml",
    "action": "pin_sha",
    "rule_module": "workflow_audit",
    "severity": "high"
  },
  {
    "reason": "Action hyperpolymath/standards/.github/workflows/governance-reusable.yml@main needs attention",
    "type": "unpinned_action",
    "file": "governance.yml",
    "action": "pin_sha",
    "rule_module": "workflow_audit",
    "severity": "high"
  },
  {
    "reason": "Python file detected -- banned language",
    "type": "banned_language_file",
    "file": "/home/runner/work/standards/standards/a2ml-templates/state-scm-to-v2.py",
    "action": "flag",
    "rule_module": "cicd_rules",
    "severity": "critical"
  },
  {
    "reason": "TypeScript file detected -- banned language",
    "type": "banned_language_file",
    "file": "/home/runner/work/standards/standards/a2ml/bindings/deno/mod.ts",
    "action": "flag",
    "rule_module": "cicd_rules",
    "severity": "critical"
  },
  {
    "reason": "TypeScript file detected -- banned language",
    "type": "banned_language_file",
    "file": "/home/runner/work/standards/standards/lol/test/vitest.config.ts",
    "action": "flag",
    "rule_module": "cicd_rules",
    "severity": "critical"
  },
  {
    "reason": "TypeScript file detected -- banned language",
    "type": "banned_language_file",
    "file": "/home/runner/work/standards/standards/k9-svc/bindings/deno/mod.ts",
    "action": "flag",
    "rule_module": "cicd_rules",
    "severity": "critical"
  },
  {
    "reason": "Agda postulate assumes without proof -- potential soundness hole (4 occurrences, CWE-704)",
    "type": "agda_postulate",
    "file": "/home/runner/work/standards/standards/lol/proofs/theories/information_theory.agda",
    "action": "flag",
    "rule_module": "code_safety",
    "severity": "critical"
  },
  {
    "reason": "believe_me undermines formal verification (1 occurrences, CWE-704)",
    "type": "believe_me",
    "file": "/home/runner/work/standards/standards/lol/src/abi/Locale.idr",
    "action": "flag",
    "rule_module": "code_safety",
    "severity": "critical"
  },
  {
    "reason": "Wildcard CORS -- restrict to specific origins or use env var (1 occurrences, CWE-942)",
    "type": "js_wildcard_cors",
    "file": "/home/runner/work/standards/standards/consent-aware-http/examples/reference-implementations/deno/aibdp_middleware.js",
    "action": "flag",
    "rule_module": "code_safety",
    "severity": "high"
  }
]

Powered by Hypatia Neurosymbolic CI/CD Intelligence

@hyperpolymath hyperpolymath enabled auto-merge (squash) May 26, 2026 08:56
hyperpolymath added a commit that referenced this pull request May 26, 2026
…or.yml drift (#187)

## Summary

Extends the reusable-workflow pattern from #168 (governance-reusable +
deno-ci-reusable) and #174 (rust-ci-reusable + elixir-ci-reusable) to
the **mirror.yml** template.

Estate audit picked this as the highest-leverage next foundational
reusable across 5 candidates (codeql, secret-scanner, hypatia-scan,
mirror, scorecard).

### Drift survey

`gh api /search/code` paginated against `org:hyperpolymath`, then
blob-SHA grouped:

| Template | Deployments | Sampled | Unique SHAs | Top-SHA share |
|---|---|---|---|---|
| **mirror.yml** | **289** | **100** | **76 (76%)** | **16% — long
tail** |
| codeql.yml | 263 | 100 | 70 (70%) | 32% |
| secret-scanner.yml | 281 | 100 | 55 (55%) | 47% |
| scorecard.yml | 258 | 258 (full) | 46 (18%) | 39% |
| hypatia-scan.yml | 255 | 200 | 31 (15.5%) | 50% |

(scorecard + hypatia-scan are already mostly converged → low leverage
now.)

mirror.yml ranks first on **drift × deployments** (76% × 289 ≈ 220) and
was verified to have **low feature variance**: all 4 top-SHA variants
sampled (covering 29/100 sampled repos: bgp-backbone-lab, ipfs-overlay,
kaldor-iiot, vcs-ircd) carried the **same 7 forge jobs** (gitlab,
bitbucket, codeberg, sourcehut, disroot, gitea, radicle). Drift is
action-SHA / whitespace churn — not feature variance — exactly the shape
that consolidates cleanly behind one workflow_call reusable.

### Design

- **No per-call inputs other than runs-on** — per-repo forge selection
already externalised to Actions vars vars.<FORGE>_MIRROR_ENABLED ==
'true', so the reusable mirrors the gating pattern verbatim.
- **secrets: inherit required at the call site** — the per-forge SSH
keys (GITLAB_SSH_KEY, BITBUCKET_SSH_KEY, CODEBERG_SSH_KEY,
SOURCEHUT_SSH_KEY, DISROOT_SSH_KEY, GITEA_SSH_KEY) and RADICLE_KEY flow
through implicitly. Without secrets: inherit the inner secrets.X
references evaluate to empty (silent push failure on each enabled
forge).
- **vars.GITEA_HOST** consumed verbatim from the caller repo's Actions
vars — same as the canonical mirror.yml.
- All actions SHA-pinned; SPDX header present; top-level permissions:
contents: read; passes the workflow-lint job in governance-reusable.yml.

No filtering logic, so no regression-test file (cf.
scripts/tests/apply-baseline-test.sh for the governance/baseline path
that needs one).

### Caller wrapper shape (post-merge)

\`\`\`yaml
# SPDX-License-Identifier: PMPL-1.0-or-later
name: Mirror to Git Forges
on:
  push:
    branches: [main]
  workflow_dispatch:
permissions:
  contents: read
jobs:
  mirror:
uses:
hyperpolymath/standards/.github/workflows/mirror-reusable.yml@<sha>
    secrets: inherit
\`\`\`

~10 lines per repo, replacing ~145 lines.

### Rollout plan (downstream wrapper sweep)

**NOT started in this PR — owner-gated, same as #174's rust-ci sweep
(which capped at 82 PRs).**

Numbers (from the 100-repo SHA-sample, extrapolated to 289):
- **289 repos** total deployments to convert
- **~85% trivially convertible** (forge set matches canonical 7-forge
list; SHA-pinned actions only differ in pin SHA / whitespace). One
mechanical wrapper PR per repo, same shape as the #168 wrappers
(absolute-zero#41, tma-mark2#41).
- **~10-15% need careful review** — long-tail SHAs may include
legitimate custom forges or local additions. Surface a per-repo diff
during sweep; defer non-canonical variants to a follow-up.
- Sweep order: pin wrappers to **this PR's HEAD SHA** while owner-gated;
rebase to merged-main SHA in the wave's final batch (same protocol as
the rust-ci sweep).

### Pattern hardening (no per-PR action required)

- Same workflow_call shape as #168 / #174 — no new infrastructure.
- Independent of #174 (rust-ci-reusable.yml) and #180 (apply-baseline.sh
glob fix) — no conflicts; can land in any order.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
hyperpolymath added a commit that referenced this pull request May 26, 2026
…ecrets to 281 repos (#190)

## Summary

Extends the reusable-workflow pattern from #168 / #174 / #187 to
**secret-scanner.yml**. Same shape as #187 (no per-call inputs except
`runs-on`; caller uses `secrets: inherit`).

### Why secret-scanner is the next foundational reusable

Estate drift survey (`gh api /search/code` paginated against
`org:hyperpolymath`, blob-SHA grouped over **all 281 deployments**):

| Metric | Value |
|---|---|
| Total deployments | **281** |
| Unique blob SHAs | **54** |
| Structural drift | **19%** (top 4 SHAs cover 69%, top 6 cover 79%) |
| Feature variance | **near-zero** — all sampled variants carry the same
3 jobs (trufflehog + gitleaks + rust-secrets) at 75-81 lines |
| True drift source | action-SHA pin churn + whitespace |

The 100-sample drift estimate (55%) initially ranked secret-scanner
third behind mirror; the full pagination reveals the actual figure is
19%. The variance was a sampling artefact.

### Security debt this PR force-fixes

The `shell-secrets` job was added to the canonical 2026-05-21 (commit
`080c394`) in direct response to the **live Cloudflare API token leak**
via `avow-protocol/deploy-repos.sh` (commit `5f2f8b2`) — a leak that
both `trufflehog --only-verified` and default `gitleaks` missed.

Of 16 estate `secret-scanner.yml` blobs sampled across the top +
long-tail SHAs, **0 carry the `shell-secrets` job**.

The post-incident guardrail intended to catch the *next* such leak has
propagated to nothing. Consolidating the workflow behind this reusable
means the wrapper sweep that follows this PR force-promotes
`shell-secrets` to all 281 repos in one batch.

### Design

- **No per-call inputs other than `runs-on`** — each job self-conditions
internally:
  - `rust-secrets` exits early on no `Cargo.toml` (safe on every repo)
  - `shell-secrets` no-ops without `.sh`/`.bash` files
  - `trufflehog` + `gitleaks` always-on (intended)
- **`secrets: inherit` required at the call site** — so the inner
`secrets.GITHUB_TOKEN` reference in the `gitleaks-action` step resolves.
Without `inherit` it falls back to anonymous mode (rate-limited; misses
some PRs).
- **Caller keeps `on:` + `concurrency:`** — so the read-only
cancel-superseded guardrail stays in the wrapper.
- SPDX header, top-level `permissions: contents: read`, all actions
SHA-pinned — passes the `workflow-lint` job in
`governance-reusable.yml`.

### Caller wrapper shape (post-merge)

```yaml
# SPDX-License-Identifier: PMPL-1.0-or-later
name: Secret Scanner
on:
  pull_request:
  push:
    branches: [main]
concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: true
permissions:
  contents: read
jobs:
  scan:
    uses: hyperpolymath/standards/.github/workflows/secret-scanner-reusable.yml@<sha>
    secrets: inherit
```

~12 lines per repo, replacing ~75-116 lines.

### Rollout plan

**NOT started in this PR — owner-gated, same as #187 / #174 sweeps.**

| Wave | Repos | Action |
|---|---|---|
| 1: bulk-mechanical | ~275 | Canonical 3-job match. Fan-out
single-commit wrapper PR per repo, pinned to this PR HEAD; rebase to
merged-main SHA before batch firing. |
| 2: slim variants | ~6 | Repos with 2-job (missing `rust-secrets`) or
1-job (`trufflehog` only) older copies. Standardize-up safely since the
missing job self-skips on non-applicable repos. |

Total expected sweep: ~281 PRs (well above the 82-PR rust-ci precedent —
recommend batching by wave; user gates each wave start).

### Pattern hardening

- Same `workflow_call` shape as #168 / #174 / #187 — no new
infrastructure.
- Independent of #174 (`rust-ci-reusable.yml`), #180
(`apply-baseline.sh` glob fix), and #187 (`mirror-reusable.yml`) — no
file conflicts; lands in any order.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
@hyperpolymath hyperpolymath merged commit 3285ac1 into main May 26, 2026
18 checks passed
@hyperpolymath hyperpolymath deleted the feat/hypatia-ignore-prefix-match branch May 26, 2026 14:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant