fix(baseline): file_pattern glob matching + jq scoping bugs#180
Merged
Conversation
…ine + apply-baseline.sh) The Hypatia baseline schema documents `file_pattern` as the canonical mechanism for exempting subtrees from governance gates (an alternative to per-file `.hypatia-ignore` enumeration), and `apply-baseline.sh` ostensibly implemented it. But both implementations had latent jq scoping bugs that caused file_pattern to silently always-match instead of glob-match. Two bugs fixed: 1. `f.file` inside `select(...)` — `f` is a jq function-parameter filter, and inside the map(select(...)) over the baseline, `.` rebinds to each baseline entry, so `f.file` re-evaluated the identity against the baseline entry rather than capturing the finding. Fix: bind `f as $finding` before entering the map. 2. `.file_pattern` inside `test(arg)` — `test`'s argument is evaluated in the input's scope, which by then is the file string (the test input), not the baseline entry. `.file_pattern` errored with "Cannot index string", silently swallowed inside `select()` and treated as truthy. Fix: capture `(.file_pattern? // null) as $pat` before entering test(). The `in_baseline()` helper inline in `governance-reusable.yml` was flagged in a comment as "intentionally NOT implementing file_pattern glob" — but the right place to add it is in_baseline itself (the caller of the language-policy step), since enforcement happens via the inline `find` + per-file lookup, not via apply-baseline.sh. Both code paths now mirror the same corrected glob → regex translation: `**` → `.*` (cross-directory), `*` → `[^/]*` (single segment). Adds `scripts/tests/apply-baseline-test.sh` with six regression cases (exact match, glob match, over-match guard, single-* slash crossing, empty baseline) — pins both bugs so they can't recur. Foundational fix for the absolute-zero language-demo repo (~30+ legitimate banned-language example files across `examples/`) and any repo that vendors such subtrees (currently maa-framework's #69 Dependabot PR is blocked by 4 such files). Downstream consumers can replace per-file `.hypatia-ignore` enumeration with one `file_pattern: "examples/**"` entry in `.hypatia-baseline.json` once this lands. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
🔍 Hypatia Security ScanFindings: 118 issues detected
View findings[
{
"reason": "Action hyperpolymath/standards/.github/workflows/deno-ci-reusable.yml@main needs attention",
"type": "unpinned_action",
"file": "deno-ci-reusable.yml",
"action": "pin_sha",
"rule_module": "workflow_audit",
"severity": "high"
},
{
"reason": "Action hyperpolymath/standards/.github/workflows/governance-reusable.yml@main needs attention",
"type": "unpinned_action",
"file": "governance-reusable.yml",
"action": "pin_sha",
"rule_module": "workflow_audit",
"severity": "high"
},
{
"reason": "Action hyperpolymath/standards/.github/workflows/governance-reusable.yml@main needs attention",
"type": "unpinned_action",
"file": "governance.yml",
"action": "pin_sha",
"rule_module": "workflow_audit",
"severity": "high"
},
{
"reason": "Python file detected -- banned language",
"type": "banned_language_file",
"file": "/home/runner/work/standards/standards/a2ml-templates/state-scm-to-v2.py",
"action": "flag",
"rule_module": "cicd_rules",
"severity": "critical"
},
{
"reason": "TypeScript file detected -- banned language",
"type": "banned_language_file",
"file": "/home/runner/work/standards/standards/a2ml/bindings/deno/mod.ts",
"action": "flag",
"rule_module": "cicd_rules",
"severity": "critical"
},
{
"reason": "TypeScript file detected -- banned language",
"type": "banned_language_file",
"file": "/home/runner/work/standards/standards/lol/test/vitest.config.ts",
"action": "flag",
"rule_module": "cicd_rules",
"severity": "critical"
},
{
"reason": "TypeScript file detected -- banned language",
"type": "banned_language_file",
"file": "/home/runner/work/standards/standards/k9-svc/bindings/deno/mod.ts",
"action": "flag",
"rule_module": "cicd_rules",
"severity": "critical"
},
{
"reason": "Agda postulate assumes without proof -- potential soundness hole (4 occurrences, CWE-704)",
"type": "agda_postulate",
"file": "/home/runner/work/standards/standards/lol/proofs/theories/information_theory.agda",
"action": "flag",
"rule_module": "code_safety",
"severity": "critical"
},
{
"reason": "believe_me undermines formal verification (1 occurrences, CWE-704)",
"type": "believe_me",
"file": "/home/runner/work/standards/standards/lol/src/abi/Locale.idr",
"action": "flag",
"rule_module": "code_safety",
"severity": "critical"
},
{
"reason": "Wildcard CORS -- restrict to specific origins or use env var (1 occurrences, CWE-942)",
"type": "js_wildcard_cors",
"file": "/home/runner/work/standards/standards/consent-aware-http/examples/reference-implementations/deno/aibdp_middleware.js",
"action": "flag",
"rule_module": "code_safety",
"severity": "high"
}
]Powered by Hypatia Neurosymbolic CI/CD Intelligence |
This was referenced May 26, 2026
hyperpolymath
added a commit
that referenced
this pull request
May 26, 2026
…or.yml drift (#187) ## Summary Extends the reusable-workflow pattern from #168 (governance-reusable + deno-ci-reusable) and #174 (rust-ci-reusable + elixir-ci-reusable) to the **mirror.yml** template. Estate audit picked this as the highest-leverage next foundational reusable across 5 candidates (codeql, secret-scanner, hypatia-scan, mirror, scorecard). ### Drift survey `gh api /search/code` paginated against `org:hyperpolymath`, then blob-SHA grouped: | Template | Deployments | Sampled | Unique SHAs | Top-SHA share | |---|---|---|---|---| | **mirror.yml** | **289** | **100** | **76 (76%)** | **16% — long tail** | | codeql.yml | 263 | 100 | 70 (70%) | 32% | | secret-scanner.yml | 281 | 100 | 55 (55%) | 47% | | scorecard.yml | 258 | 258 (full) | 46 (18%) | 39% | | hypatia-scan.yml | 255 | 200 | 31 (15.5%) | 50% | (scorecard + hypatia-scan are already mostly converged → low leverage now.) mirror.yml ranks first on **drift × deployments** (76% × 289 ≈ 220) and was verified to have **low feature variance**: all 4 top-SHA variants sampled (covering 29/100 sampled repos: bgp-backbone-lab, ipfs-overlay, kaldor-iiot, vcs-ircd) carried the **same 7 forge jobs** (gitlab, bitbucket, codeberg, sourcehut, disroot, gitea, radicle). Drift is action-SHA / whitespace churn — not feature variance — exactly the shape that consolidates cleanly behind one workflow_call reusable. ### Design - **No per-call inputs other than runs-on** — per-repo forge selection already externalised to Actions vars vars.<FORGE>_MIRROR_ENABLED == 'true', so the reusable mirrors the gating pattern verbatim. - **secrets: inherit required at the call site** — the per-forge SSH keys (GITLAB_SSH_KEY, BITBUCKET_SSH_KEY, CODEBERG_SSH_KEY, SOURCEHUT_SSH_KEY, DISROOT_SSH_KEY, GITEA_SSH_KEY) and RADICLE_KEY flow through implicitly. Without secrets: inherit the inner secrets.X references evaluate to empty (silent push failure on each enabled forge). - **vars.GITEA_HOST** consumed verbatim from the caller repo's Actions vars — same as the canonical mirror.yml. - All actions SHA-pinned; SPDX header present; top-level permissions: contents: read; passes the workflow-lint job in governance-reusable.yml. No filtering logic, so no regression-test file (cf. scripts/tests/apply-baseline-test.sh for the governance/baseline path that needs one). ### Caller wrapper shape (post-merge) \`\`\`yaml # SPDX-License-Identifier: PMPL-1.0-or-later name: Mirror to Git Forges on: push: branches: [main] workflow_dispatch: permissions: contents: read jobs: mirror: uses: hyperpolymath/standards/.github/workflows/mirror-reusable.yml@<sha> secrets: inherit \`\`\` ~10 lines per repo, replacing ~145 lines. ### Rollout plan (downstream wrapper sweep) **NOT started in this PR — owner-gated, same as #174's rust-ci sweep (which capped at 82 PRs).** Numbers (from the 100-repo SHA-sample, extrapolated to 289): - **289 repos** total deployments to convert - **~85% trivially convertible** (forge set matches canonical 7-forge list; SHA-pinned actions only differ in pin SHA / whitespace). One mechanical wrapper PR per repo, same shape as the #168 wrappers (absolute-zero#41, tma-mark2#41). - **~10-15% need careful review** — long-tail SHAs may include legitimate custom forges or local additions. Surface a per-repo diff during sweep; defer non-canonical variants to a follow-up. - Sweep order: pin wrappers to **this PR's HEAD SHA** while owner-gated; rebase to merged-main SHA in the wave's final batch (same protocol as the rust-ci sweep). ### Pattern hardening (no per-PR action required) - Same workflow_call shape as #168 / #174 — no new infrastructure. - Independent of #174 (rust-ci-reusable.yml) and #180 (apply-baseline.sh glob fix) — no conflicts; can land in any order. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
hyperpolymath
added a commit
that referenced
this pull request
May 26, 2026
…ecrets to 281 repos (#190) ## Summary Extends the reusable-workflow pattern from #168 / #174 / #187 to **secret-scanner.yml**. Same shape as #187 (no per-call inputs except `runs-on`; caller uses `secrets: inherit`). ### Why secret-scanner is the next foundational reusable Estate drift survey (`gh api /search/code` paginated against `org:hyperpolymath`, blob-SHA grouped over **all 281 deployments**): | Metric | Value | |---|---| | Total deployments | **281** | | Unique blob SHAs | **54** | | Structural drift | **19%** (top 4 SHAs cover 69%, top 6 cover 79%) | | Feature variance | **near-zero** — all sampled variants carry the same 3 jobs (trufflehog + gitleaks + rust-secrets) at 75-81 lines | | True drift source | action-SHA pin churn + whitespace | The 100-sample drift estimate (55%) initially ranked secret-scanner third behind mirror; the full pagination reveals the actual figure is 19%. The variance was a sampling artefact. ### Security debt this PR force-fixes The `shell-secrets` job was added to the canonical 2026-05-21 (commit `080c394`) in direct response to the **live Cloudflare API token leak** via `avow-protocol/deploy-repos.sh` (commit `5f2f8b2`) — a leak that both `trufflehog --only-verified` and default `gitleaks` missed. Of 16 estate `secret-scanner.yml` blobs sampled across the top + long-tail SHAs, **0 carry the `shell-secrets` job**. The post-incident guardrail intended to catch the *next* such leak has propagated to nothing. Consolidating the workflow behind this reusable means the wrapper sweep that follows this PR force-promotes `shell-secrets` to all 281 repos in one batch. ### Design - **No per-call inputs other than `runs-on`** — each job self-conditions internally: - `rust-secrets` exits early on no `Cargo.toml` (safe on every repo) - `shell-secrets` no-ops without `.sh`/`.bash` files - `trufflehog` + `gitleaks` always-on (intended) - **`secrets: inherit` required at the call site** — so the inner `secrets.GITHUB_TOKEN` reference in the `gitleaks-action` step resolves. Without `inherit` it falls back to anonymous mode (rate-limited; misses some PRs). - **Caller keeps `on:` + `concurrency:`** — so the read-only cancel-superseded guardrail stays in the wrapper. - SPDX header, top-level `permissions: contents: read`, all actions SHA-pinned — passes the `workflow-lint` job in `governance-reusable.yml`. ### Caller wrapper shape (post-merge) ```yaml # SPDX-License-Identifier: PMPL-1.0-or-later name: Secret Scanner on: pull_request: push: branches: [main] concurrency: group: ${{ github.workflow }}-${{ github.ref }} cancel-in-progress: true permissions: contents: read jobs: scan: uses: hyperpolymath/standards/.github/workflows/secret-scanner-reusable.yml@<sha> secrets: inherit ``` ~12 lines per repo, replacing ~75-116 lines. ### Rollout plan **NOT started in this PR — owner-gated, same as #187 / #174 sweeps.** | Wave | Repos | Action | |---|---|---| | 1: bulk-mechanical | ~275 | Canonical 3-job match. Fan-out single-commit wrapper PR per repo, pinned to this PR HEAD; rebase to merged-main SHA before batch firing. | | 2: slim variants | ~6 | Repos with 2-job (missing `rust-secrets`) or 1-job (`trufflehog` only) older copies. Standardize-up safely since the missing job self-skips on non-applicable repos. | Total expected sweep: ~281 PRs (well above the 82-PR rust-ci precedent — recommend batching by wave; user gates each wave start). ### Pattern hardening - Same `workflow_call` shape as #168 / #174 / #187 — no new infrastructure. - Independent of #174 (`rust-ci-reusable.yml`), #180 (`apply-baseline.sh` glob fix), and #187 (`mirror-reusable.yml`) — no file conflicts; lands in any order. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two latent jq scoping bugs in baseline
file_patternmatching:f.filere-evaluates the identity insidemap(select(...))—.rebinds to each baseline entry, sof.filereturned the baseline entry's.filefield instead of the finding's. Fix:f as $finding | ...before entering the map..file_patterninsidetest(arg)—test's argument is evaluated in the input's scope (the file string), so.file_patternerrored with "Cannot index string", silently swallowed byselect()and treated truthy. This madefile_patternentries always-match. Fix:(.file_pattern? // null) as $pat | ...capture beforetest().Both
scripts/apply-baseline.shand the inlinein_baseline()helper ingovernance-reusable.ymlhad these bugs. Both now mirror the same corrected**→.*/*→[^/]*glob → regex translation.Adds
scripts/tests/apply-baseline-test.sh— six regression cases that pin both bugs so they can't recur (exact match, glob match, over-match guard, single-* segment, slash-crossing, empty baseline).Why this matters
The absolute-zero language-demo repo carries ~30 legitimate banned-language example files under
examples/(Java, Kotlin, Swift, Dart, Cobol, Erlang, Fortran, Lisp, …). Itsmainbranch currently fails governance on every push because.hypatia-baseline.jsonfile_patterndoesn't actually exempt anything. Same blocker: maa-framework #69 (Dependabot rust-toolchain bump) — vendoredabsolute-zero/examples/{java,kotlin}/*.{java,kt}files trip the language-policy step.After this lands, both can use a single
file_pattern: "examples/**"entry instead of per-file.hypatia-ignoreenumeration.Test plan
scripts/tests/apply-baseline-test.sh— 6/6 pass locallyexamples/baseline-exempted (Java enforce fails onsrc/RealCode.java, Swift enforce passes onexamples/swift/*).hypatia-baseline.jsonwithfile_pattern) and maa-framework🤖 Generated with Claude Code