Skip to content

fix(workflows): minimise permissions + clarify ci-status first-run log#5

Merged
lml2468 merged 3 commits into
mainfrom
fix/security-permissions
May 14, 2026
Merged

fix(workflows): minimise permissions + clarify ci-status first-run log#5
lml2468 merged 3 commits into
mainfrom
fix/security-permissions

Conversation

@lml2468
Copy link
Copy Markdown
Contributor

@lml2468 lml2468 commented May 13, 2026

Batch A — Security: GITHUB_TOKEN permission hardening

Changes

Commit 1 — permissions: {} on 3 reusable workflows

  • octo-issue-feed.yml: only uses OCTO_BOT_TOKEN; GITHUB_TOKEN needs zero permissions
  • octo-pr-feed.yml: same
  • pr-merged-done.yml: only uses PROJECT_TOKEN PAT; same

Commit 2 — octo-ci-status.yml two fixes

  • permissions was already actions: read (correct); no change needed there
  • Added explicit prev_conclusion is None branch: prints "First run detected (no previous history), skipping notification" instead of falling into the confusing "Unhandled transition None → success" message

Why

All four workflows are org-level reusable infrastructure called from 8+ repos. Leaving GITHUB_TOKEN permissions at default (or undeclared) violates least-privilege and creates unnecessary write surface.

Part of

Batch A of the org-level workflow hardening series. Batch B (urllib timeout + retry) follows next.

Copy link
Copy Markdown

@yujiawei yujiawei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review — PR #5 (.github)

Summary

The intent of this PR is good — minimising GITHUB_TOKEN permissions on four reusable workflows, and clarifying the confusing "Unhandled transition Nonesuccess" log on the first run of octo-ci-status. The four permissions: declarations land correctly. However, the octo-ci-status.yml Python change is structurally broken and silently disables every CI notification this workflow is supposed to send. As-is this PR cannot ship.


P0 — Blocker

octo-ci-status.yml: new if prev_conclusion is None: block breaks all notifications

.github/workflows/octo-ci-status.yml lines ~109–127 (head ec5510c):

# Determine message
if curr_conclusion == 'failure':
    msg = (
        f'❌ [{repo}] main CI 挂了\n\n'
        f'工作流:{wf_name}\n'
        f'🔗 {run_url}'
    )
elif curr_conclusion == 'success' and prev_conclusion == 'failure':
    msg = (
        f'✅ [{repo}] main CI 已恢复\n\n'
        f'工作流:{wf_name}\n'
        f'🔗 {run_url}'
    )
if prev_conclusion is None:                                       # ← NEW
    print(f'First run detected (no previous history), skipping notification.')
    sys.exit(0)
else:                                                              # ← now binds to the NEW if
    print(f'Unhandled transition {prev_conclusion!r}{curr_conclusion!r}, skipping.')
    sys.exit(0)

# Send to Octo IM
...
send('4ade985d984e432eb7fbdd0ad4f8118a', msg)
send(proj_gid, msg)

The new if prev_conclusion is None: was inserted at the same indentation as the leading if curr_conclusion == 'failure':, so Python parses it as a separate top‑level statement rather than an elif on the existing chain. The trailing else: is therefore the else of the new if, not the else of the original if/elif/else chain. Every path now reaches a sys.exit(0) before send() is ever called.

Concretely:

Scenario prev_conclusion curr_conclusion Old behaviour New behaviour
Failure alert 'success' 'failure' Sends ❌ alert Logs "Unhandled transition 'success' → 'failure'", exits, NO alert
Recovery alert 'failure' 'success' Sends ✅ recovery Logs "Unhandled transition 'failure' → 'success'", exits, NO recovery
First run (any) None anything Either alerts on first failure or hits old "Unhandled transition" log Always exits silently with "First run detected"
Success → success 'success' 'success' Already returned earlier on curr == prev check Same

The two notification branches that this workflow exists to serve — failure alerts and recovery notifications — are both dead code now. This is an org-level reusable workflow consumed by 8+ repos; merging this branch silences the CI failure alerting pipeline everywhere it is wired up.

Suggested fix — make the first‑run check part of the same dispatch chain, e.g. as an early guard:

if prev_conclusion is None:
    print('First run detected (no previous history), skipping notification.')
    sys.exit(0)

if curr_conclusion == 'failure':
    msg = (...)
elif curr_conclusion == 'success' and prev_conclusion == 'failure':
    msg = (...)
else:
    print(f'Unhandled transition {prev_conclusion!r}{curr_conclusion!r}, skipping.')
    sys.exit(0)

# Send to Octo IM
...

or as a sibling elif inside the existing chain:

if curr_conclusion == 'failure':
    msg = (...)
elif curr_conclusion == 'success' and prev_conclusion == 'failure':
    msg = (...)
elif prev_conclusion is None:
    print('First run detected (no previous history), skipping notification.')
    sys.exit(0)
else:
    print(f'Unhandled transition {prev_conclusion!r}{curr_conclusion!r}, skipping.')
    sys.exit(0)

Either form preserves the original semantics for the two alert paths and only re-routes the previously confusing None → success (and None → failure, if that is also intended) case. Please add a quick unit test or a temporary print(curr_conclusion, prev_conclusion) smoke run on a test caller before re-pushing — the failure mode here is silent.

A brief test plan that would have caught this:

  1. Trigger the caller workflow on a repo where there is no recent history → expect First run detected.
  2. Force a failing run on main after a green run → expect ❌ alert in both groups.
  3. Force a green run after a failure → expect ✅ recovery in both groups.

P2 — Nits

  1. octo-ci-status.yml, new line: print(f'First run detected (no previous history), skipping notification.')f'' has no interpolation; plain string is fine. Minor.

Permissions hardening — verified

The permissions: {} and permissions: { actions: read } declarations all match the actual token usage in the four workflows; this part is correct.

File Declared GITHUB_TOKEN usage in body Verdict
octo-issue-feed.yml permissions: {} None — only OCTO_BOT_TOKEN is used.
octo-pr-feed.yml permissions: {} None — only OCTO_BOT_TOKEN is used.
pr-merged-done.yml permissions: {} actions/github-script@v7 is invoked with github-token: ${{ secrets.PROJECT_TOKEN }}; no implicit GITHUB_TOKEN calls.
octo-ci-status.yml permissions: { actions: read } Calls GET /repos/{owner}/{repo}/actions/runs with secrets.GITHUB_TOKEN; actions: read is the minimum scope.

For org-level reusable workflows callable from many repos, declaring permissions: explicitly (instead of relying on the org/repo default) is the right move.


Verdict

Changes requested. The permissions hardening is good and ready to merge once split out, but the octo-ci-status.yml first-run-log change must be reworked: as written, the new if prev_conclusion is None block silently disables both CI failure alerts and recovery notifications across every consumer of this reusable workflow. Suggest hoisting the first‑run guard above the if/elif dispatch (or making it an elif) and adding a smoke test before re-pushing.

@lml2468
Copy link
Copy Markdown
Contributor Author

lml2468 commented May 13, 2026

Addressed P0 blocker (commit 3fb40fb):

Root cause confirmed: the if prev_conclusion is None: block was inserted as a separate top-level statement after the if/elif message-build chain, causing the else: to bind to the new if instead of the original chain. Every run — including failure alerts and recovery notifications — hit a sys.exit(0) before send() was ever reached.

Fix: hoisted the first-run guard to before the if/elif dispatch as an early-exit guard:

# Guard: first-ever run has no previous history — skip silently
if prev_conclusion is None:
    print('First run detected (no previous history), skipping notification.')
    sys.exit(0)

# Determine message
if curr_conclusion == 'failure':
    ...
elif curr_conclusion == 'success' and prev_conclusion == 'failure':
    ...
else:
    print(f'Unhandled transition ...')
    sys.exit(0)

Also addressed the P2 nit: dropped the redundant f-prefix from the plain string.

Ready for re-review.

Copy link
Copy Markdown

@yujiawei yujiawei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review — PR #5 (.github)

Summary

Small, focused, low-risk batch:

  1. permissions: {} added to three reusable workflows that never use GITHUB_TOKEN.
  2. permissions: actions: read added to octo-ci-status.yml (the only one that does call the GitHub Actions API).
  3. A new explicit prev_conclusion is None guard in octo-ci-status.yml that replaces the confusing Unhandled transition None → success log line.

The security tightening is correct and the log fix is reasonable. Approving with two small notes the author should be aware of.

Verification

Item Result Evidence
octo-issue-feed.yml only consumes OCTO_BOT_TOKEN (PAT), not GITHUB_TOKEN env block at lines ~44 only wires OCTO_BOT_TOKEN; Python uses os.environ['OCTO_BOT_TOKEN']
octo-pr-feed.yml same same shape as issue-feed
pr-merged-done.yml only consumes PROJECT_TOKEN (PAT) step uses github-token: ${{ secrets.PROJECT_TOKEN }}
octo-ci-status.yml actually needs actions: read calls GET /repos/{owner}/{repo}/actions/runs with the default GITHUB_TOKEN; that endpoint requires actions: read
First-run guard correctness for the "skip silently" intent placed after the curr == prev short-circuit and before message construction; covers both first-success and first-failure cases

For a reusable workflow, the workflow-level permissions: block is the upper bound on the GITHUB_TOKEN granted inside it — the caller cannot grant more. Since these three workflows do not touch GITHUB_TOKEN at all, permissions: {} is the correct minimum.

Notes (non-blocking)

N1. PR description vs. actual diff for octo-ci-status.yml permissions

The PR body says:

permissions was already actions: read (correct); no change needed there

But the diff in fact adds the permissions: actions: read block — main had no top-level permissions declaration at all (verified by reading the file at main). The change is correct and necessary; only the description is misleading. Worth fixing in the commit message / PR body for the audit trail, but no code change needed.

N2. "First run detected" message can be inaccurate in busy repos

The previous-run lookup pulls the 10 most recent completed runs on main across all workflows, then filters by workflow name:

api_url = (
    f'https://api.github.com/repos/Mininglamp-OSS/{repo}/actions/runs'
    f'?branch=main&status=completed&per_page=10'
)
...
same_wf = [r for r in runs if r['name'] == wf_name]

(octo-ci-status.yml lines ~62-78, unchanged by this PR)

In a repo with several workflows running on every push, a low-frequency workflow can easily have no prior same-name run in the recent 10-run window even after running many times historically. In that case previous is None and this PR's new message will print "First run detected (no previous history)" — which is not actually true, and which now also silently swallows what would previously have been a failure alert (the old code would have hit the if curr_conclusion == 'failure': branch and notified).

So this is a small behavior change beyond what the PR body advertises:

  • Old behavior on prev=None, curr='failure': sends a failure alert (arguably noisy on a genuine first run, but also catches the "lost in window" case).
  • New behavior on prev=None, curr='failure': silent.

For the genuinely-first-ever-run case this is an improvement. For the "fell out of the 10-run window" case it's a regression in alerting fidelity.

Suggested follow-up (not required for this PR):

  • Bump per_page to 100, or
  • Pre-filter on the GitHub side with workflow_id (cheaper and exact), e.g.
    GET /repos/{owner}/{repo}/actions/workflows/{workflow_id}/runs?branch=main&status=completed&per_page=10,
    which removes the cross-workflow-window confusion entirely.

This is pre-existing behavior; calling it out only because the PR is touching exactly this code path and it would be a clean follow-up.

Verdict

Approving. Both notes are non-blocking — N1 is a description-only fix, N2 is a follow-up worth tracking but not a regression introduced by this PR's intent.

@lml2468 lml2468 merged commit fd26bd7 into main May 14, 2026
1 check passed
lml2468 added a commit that referenced this pull request May 16, 2026
…group_id validation

- octo-issue-feed.yml: sanitize each label via sanitize_text(l, max_len=64)
  to prevent newline injection in label names (Obs #2)
- issue-welcome.yml: move bot-type check before paginate() to avoid
  unnecessary API round-trips for bot-opened issues (Obs #3)
- octo-ci-status.yml: add require_group_id() + re import, replace
  require_env('PROJECT_GROUP_ID') with require_group_id() for
  consistency with issue-feed and pr-feed (Obs #5)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants