fix(workflows): minimise permissions + clarify ci-status first-run log#5
Conversation
yujiawei
left a comment
There was a problem hiding this comment.
Code Review — PR #5 (.github)
Summary
The intent of this PR is good — minimising GITHUB_TOKEN permissions on four reusable workflows, and clarifying the confusing "Unhandled transition None → success" log on the first run of octo-ci-status. The four permissions: declarations land correctly. However, the octo-ci-status.yml Python change is structurally broken and silently disables every CI notification this workflow is supposed to send. As-is this PR cannot ship.
P0 — Blocker
octo-ci-status.yml: new if prev_conclusion is None: block breaks all notifications
.github/workflows/octo-ci-status.yml lines ~109–127 (head ec5510c):
# Determine message
if curr_conclusion == 'failure':
msg = (
f'❌ [{repo}] main CI 挂了\n\n'
f'工作流:{wf_name}\n'
f'🔗 {run_url}'
)
elif curr_conclusion == 'success' and prev_conclusion == 'failure':
msg = (
f'✅ [{repo}] main CI 已恢复\n\n'
f'工作流:{wf_name}\n'
f'🔗 {run_url}'
)
if prev_conclusion is None: # ← NEW
print(f'First run detected (no previous history), skipping notification.')
sys.exit(0)
else: # ← now binds to the NEW if
print(f'Unhandled transition {prev_conclusion!r} → {curr_conclusion!r}, skipping.')
sys.exit(0)
# Send to Octo IM
...
send('4ade985d984e432eb7fbdd0ad4f8118a', msg)
send(proj_gid, msg)The new if prev_conclusion is None: was inserted at the same indentation as the leading if curr_conclusion == 'failure':, so Python parses it as a separate top‑level statement rather than an elif on the existing chain. The trailing else: is therefore the else of the new if, not the else of the original if/elif/else chain. Every path now reaches a sys.exit(0) before send() is ever called.
Concretely:
| Scenario | prev_conclusion |
curr_conclusion |
Old behaviour | New behaviour |
|---|---|---|---|---|
| Failure alert | 'success' |
'failure' |
Sends ❌ alert | Logs "Unhandled transition 'success' → 'failure'", exits, NO alert |
| Recovery alert | 'failure' |
'success' |
Sends ✅ recovery | Logs "Unhandled transition 'failure' → 'success'", exits, NO recovery |
| First run (any) | None |
anything | Either alerts on first failure or hits old "Unhandled transition" log | Always exits silently with "First run detected" |
| Success → success | 'success' |
'success' |
Already returned earlier on curr == prev check |
Same |
The two notification branches that this workflow exists to serve — failure alerts and recovery notifications — are both dead code now. This is an org-level reusable workflow consumed by 8+ repos; merging this branch silences the CI failure alerting pipeline everywhere it is wired up.
Suggested fix — make the first‑run check part of the same dispatch chain, e.g. as an early guard:
if prev_conclusion is None:
print('First run detected (no previous history), skipping notification.')
sys.exit(0)
if curr_conclusion == 'failure':
msg = (...)
elif curr_conclusion == 'success' and prev_conclusion == 'failure':
msg = (...)
else:
print(f'Unhandled transition {prev_conclusion!r} → {curr_conclusion!r}, skipping.')
sys.exit(0)
# Send to Octo IM
...or as a sibling elif inside the existing chain:
if curr_conclusion == 'failure':
msg = (...)
elif curr_conclusion == 'success' and prev_conclusion == 'failure':
msg = (...)
elif prev_conclusion is None:
print('First run detected (no previous history), skipping notification.')
sys.exit(0)
else:
print(f'Unhandled transition {prev_conclusion!r} → {curr_conclusion!r}, skipping.')
sys.exit(0)Either form preserves the original semantics for the two alert paths and only re-routes the previously confusing None → success (and None → failure, if that is also intended) case. Please add a quick unit test or a temporary print(curr_conclusion, prev_conclusion) smoke run on a test caller before re-pushing — the failure mode here is silent.
A brief test plan that would have caught this:
- Trigger the caller workflow on a repo where there is no recent history → expect
First run detected. - Force a failing run on
mainafter a green run → expect ❌ alert in both groups. - Force a green run after a failure → expect ✅ recovery in both groups.
P2 — Nits
octo-ci-status.yml, new line:print(f'First run detected (no previous history), skipping notification.')—f''has no interpolation; plain string is fine. Minor.
Permissions hardening — verified
The permissions: {} and permissions: { actions: read } declarations all match the actual token usage in the four workflows; this part is correct.
| File | Declared | GITHUB_TOKEN usage in body | Verdict |
|---|---|---|---|
octo-issue-feed.yml |
permissions: {} |
None — only OCTO_BOT_TOKEN is used. |
✅ |
octo-pr-feed.yml |
permissions: {} |
None — only OCTO_BOT_TOKEN is used. |
✅ |
pr-merged-done.yml |
permissions: {} |
actions/github-script@v7 is invoked with github-token: ${{ secrets.PROJECT_TOKEN }}; no implicit GITHUB_TOKEN calls. |
✅ |
octo-ci-status.yml |
permissions: { actions: read } |
Calls GET /repos/{owner}/{repo}/actions/runs with secrets.GITHUB_TOKEN; actions: read is the minimum scope. |
✅ |
For org-level reusable workflows callable from many repos, declaring permissions: explicitly (instead of relying on the org/repo default) is the right move.
Verdict
Changes requested. The permissions hardening is good and ready to merge once split out, but the octo-ci-status.yml first-run-log change must be reworked: as written, the new if prev_conclusion is None block silently disables both CI failure alerts and recovery notifications across every consumer of this reusable workflow. Suggest hoisting the first‑run guard above the if/elif dispatch (or making it an elif) and adding a smoke test before re-pushing.
|
Addressed P0 blocker (commit Root cause confirmed: the Fix: hoisted the first-run guard to before the # Guard: first-ever run has no previous history — skip silently
if prev_conclusion is None:
print('First run detected (no previous history), skipping notification.')
sys.exit(0)
# Determine message
if curr_conclusion == 'failure':
...
elif curr_conclusion == 'success' and prev_conclusion == 'failure':
...
else:
print(f'Unhandled transition ...')
sys.exit(0)Also addressed the P2 nit: dropped the redundant Ready for re-review. |
yujiawei
left a comment
There was a problem hiding this comment.
Code Review — PR #5 (.github)
Summary
Small, focused, low-risk batch:
permissions: {}added to three reusable workflows that never useGITHUB_TOKEN.permissions: actions: readadded toocto-ci-status.yml(the only one that does call the GitHub Actions API).- A new explicit
prev_conclusion is Noneguard inocto-ci-status.ymlthat replaces the confusingUnhandled transition None → successlog line.
The security tightening is correct and the log fix is reasonable. Approving with two small notes the author should be aware of.
Verification
| Item | Result | Evidence |
|---|---|---|
octo-issue-feed.yml only consumes OCTO_BOT_TOKEN (PAT), not GITHUB_TOKEN |
✅ | env block at lines ~44 only wires OCTO_BOT_TOKEN; Python uses os.environ['OCTO_BOT_TOKEN'] |
octo-pr-feed.yml same |
✅ | same shape as issue-feed |
pr-merged-done.yml only consumes PROJECT_TOKEN (PAT) |
✅ | step uses github-token: ${{ secrets.PROJECT_TOKEN }} |
octo-ci-status.yml actually needs actions: read |
✅ | calls GET /repos/{owner}/{repo}/actions/runs with the default GITHUB_TOKEN; that endpoint requires actions: read |
| First-run guard correctness for the "skip silently" intent | ✅ | placed after the curr == prev short-circuit and before message construction; covers both first-success and first-failure cases |
For a reusable workflow, the workflow-level permissions: block is the upper bound on the GITHUB_TOKEN granted inside it — the caller cannot grant more. Since these three workflows do not touch GITHUB_TOKEN at all, permissions: {} is the correct minimum.
Notes (non-blocking)
N1. PR description vs. actual diff for octo-ci-status.yml permissions
The PR body says:
permissionswas alreadyactions: read(correct); no change needed there
But the diff in fact adds the permissions: actions: read block — main had no top-level permissions declaration at all (verified by reading the file at main). The change is correct and necessary; only the description is misleading. Worth fixing in the commit message / PR body for the audit trail, but no code change needed.
N2. "First run detected" message can be inaccurate in busy repos
The previous-run lookup pulls the 10 most recent completed runs on main across all workflows, then filters by workflow name:
api_url = (
f'https://api.github.com/repos/Mininglamp-OSS/{repo}/actions/runs'
f'?branch=main&status=completed&per_page=10'
)
...
same_wf = [r for r in runs if r['name'] == wf_name](octo-ci-status.yml lines ~62-78, unchanged by this PR)
In a repo with several workflows running on every push, a low-frequency workflow can easily have no prior same-name run in the recent 10-run window even after running many times historically. In that case previous is None and this PR's new message will print "First run detected (no previous history)" — which is not actually true, and which now also silently swallows what would previously have been a failure alert (the old code would have hit the if curr_conclusion == 'failure': branch and notified).
So this is a small behavior change beyond what the PR body advertises:
- Old behavior on
prev=None, curr='failure': sends a failure alert (arguably noisy on a genuine first run, but also catches the "lost in window" case). - New behavior on
prev=None, curr='failure': silent.
For the genuinely-first-ever-run case this is an improvement. For the "fell out of the 10-run window" case it's a regression in alerting fidelity.
Suggested follow-up (not required for this PR):
- Bump
per_pageto 100, or - Pre-filter on the GitHub side with
workflow_id(cheaper and exact), e.g.
GET /repos/{owner}/{repo}/actions/workflows/{workflow_id}/runs?branch=main&status=completed&per_page=10,
which removes the cross-workflow-window confusion entirely.
This is pre-existing behavior; calling it out only because the PR is touching exactly this code path and it would be a clean follow-up.
Verdict
Approving. Both notes are non-blocking — N1 is a description-only fix, N2 is a follow-up worth tracking but not a regression introduced by this PR's intent.
…group_id validation - octo-issue-feed.yml: sanitize each label via sanitize_text(l, max_len=64) to prevent newline injection in label names (Obs #2) - issue-welcome.yml: move bot-type check before paginate() to avoid unnecessary API round-trips for bot-opened issues (Obs #3) - octo-ci-status.yml: add require_group_id() + re import, replace require_env('PROJECT_GROUP_ID') with require_group_id() for consistency with issue-feed and pr-feed (Obs #5)
Batch A — Security: GITHUB_TOKEN permission hardening
Changes
Commit 1 —
permissions: {}on 3 reusable workflowsocto-issue-feed.yml: only usesOCTO_BOT_TOKEN; GITHUB_TOKEN needs zero permissionsocto-pr-feed.yml: samepr-merged-done.yml: only usesPROJECT_TOKENPAT; sameCommit 2 —
octo-ci-status.ymltwo fixespermissionswas alreadyactions: read(correct); no change needed thereprev_conclusion is Nonebranch: prints "First run detected (no previous history), skipping notification" instead of falling into the confusing "Unhandled transition None → success" messageWhy
All four workflows are org-level reusable infrastructure called from 8+ repos. Leaving GITHUB_TOKEN permissions at default (or undeclared) violates least-privilege and creates unnecessary write surface.
Part of
Batch A of the org-level workflow hardening series. Batch B (urllib timeout + retry) follows next.