What happened?
Summary
When a town's git_auth config has no GitHub token (none of github_token, github_cli_pat, or platform_integration_id are set), the PR status poller in the merge_request bead lifecycle fails with a misleading error message that suggests the GitHub API itself is the problem.
What the user sees
After 10 consecutive null poll attempts, the merge_request bead is marked failed with this metadata:
{
"failureReason": "pr_poll_failed",
"failureMessage": "Cannot poll PR status — GitHub API returned null 10 consecutive times. Check that a valid GitHub token is configured in town settings and that the GitHub API is reachable."
}
This is confusing because:
- The polecat that produced the PR successfully created the PR using a working credential, so users naturally assume "of course GitHub auth works, the agent just used it."
- The error implies the GitHub API was contacted and returned
null, when in reality no API call was made — resolveGitHubToken returned null and checkPRStatus short-circuited before any fetch.
Root cause
services/gastown/src/dos/town/town-scm.ts:checkPRStatus returns null for at least four distinct conditions, which the caller in actions.ts (the poll_pr action handler around line 1030) collapses into a single poll_null_count metric:
resolveGitHubToken(ctx) returned null (no token in town config and no platform integration available) — only logs a console.warn at town-scm.ts:70
fetch to api.github.com/repos/.../pulls/N returned a non-OK status (could be 404, 401, 403, 5xx, secondary rate limit) — only logs at town-scm.ts:86
response.json() failed
GitHubPRStatusSchema.safeParse(json) failed (Zod parse mismatch)
All four return null indistinguishably. The warn logs are present, but they don't surface to the bead's metadata or to the user.
The polecat side uses an entirely different code path (its own container-injected GitHub credential for git push / PR creation), so a working polecat does not imply the town worker can resolve a token. They are independent.
Why this is hard to diagnose
The user has no signal that points to the actual cause. They see "GitHub API returned null" and a hint to check the token, but:
- They likely already verified the polecat created the PR successfully → "auth works"
- The token check in town settings UI may show a token field configured at the org/rig level but not at the town level, and the user can't easily tell which level
resolveGitHubToken actually consults
- There's no way to tell from the bead whether the failure is "no token" vs. "bad token" vs. "GitHub 5xx" vs. "schema drift"
Reproduction
- Create a town without setting a
github_token / github_cli_pat / platform_integration_id in its config (or have the platform integration fail). Let polecats use their own container creds.
- Sling a bead that opens a PR.
- The merge_request bead created by
gt_done will poll checkPRStatus, get null 10x, and fail with the misleading message.
Real-world example: bead c262038b-f24e-4e21-a89c-7fb3a5f9864f in town 98172328-9bd1-4b59-ba3e-0ae627058e6b, rig b6cf4b32-4e1b-4558-a864-a2a8df7bb1de, against PR #3148 — which is OPEN/MERGEABLE/CLEAN and trivially fetchable via a manually authenticated GraphQL query, ruling out actual GitHub API trouble.
Suggested fixes
1. Distinguish the null causes in checkPRStatus (high impact, low effort)
Return a discriminated union from checkPRStatus instead of PRStatusResult | null:
type PRStatusError =
| { kind: 'no_token' }
| { kind: 'http_error'; status: number; statusText: string }
| { kind: 'invalid_response'; reason: 'json_parse' | 'schema_mismatch' }
| { kind: 'unrecognized_url' };
type PRStatusOutcome =
| { ok: true; result: PRStatusResult }
| { ok: false; error: PRStatusError };
Then the poll_pr handler in actions.ts can write the specific error kind into the bead's metadata.failureReason / failureMessage. Examples:
no_token → "Cannot poll PR status — no GitHub token configured for this town. Add a GitHub PAT or platform integration in town settings."
http_error 404 → "PR not found. Was the branch deleted before the PR could be polled?"
http_error 401 → "Town's GitHub token is invalid or expired."
http_error 403 → "Town's GitHub token lacks pull-requests: read permission for this repo, or hit a secondary rate limit."
http_error 5xx → keep retrying with backoff rather than counting toward null threshold.
schema_mismatch → "GitHub API response shape changed; please file a bug." (and include a few keys for the bug report).
2. Don't fail-fast on legitimate transient errors
Currently, any null counts toward the 10-strike threshold, including transient 5xx and rate limits. After the discriminated union split, only no_token / 4xx auth errors / repeated schema_mismatch should fail-fast; transient issues should reset or use a longer threshold.
3. Surface the actionable hint about which token level is consulted
In the failure message, name the resolution chain:
"No GitHub token resolved. Tried (in order): town git_auth.github_token, town github_cli_pat, town platform integration, rig platform integration. Configure one of these in town or rig settings."
This would have saved the user in the repro above ~30 minutes of confusion since they were looking at polecat container auth, not town config auth.
4. UI: show a "Town GitHub token" health indicator in town settings
When the town has no resolvable GitHub token, show a yellow warning banner: "Polecats can still create PRs (they use their own credentials), but the town cannot poll PR status to land merged work. Configure a token below." This decouples the two concerns visually so users stop assuming the polecat's success implies a healthy town config.
Area
Merge Queue / Refinery
Context
- Town ID: 98172328-9bd1-4b59-ba3e-0ae627058e6b
- Agent: Mayor (0c952401-2aaa-4335-bee2-35036e90483c)
- Rig ID: b6cf4b32-4e1b-4558-a864-a2a8df7bb1de
Filed automatically by the Mayor via gt_report_bug.
What happened?
Summary
When a town's
git_authconfig has no GitHub token (none ofgithub_token,github_cli_pat, orplatform_integration_idare set), the PR status poller in the merge_request bead lifecycle fails with a misleading error message that suggests the GitHub API itself is the problem.What the user sees
After 10 consecutive null poll attempts, the merge_request bead is marked
failedwith this metadata:{ "failureReason": "pr_poll_failed", "failureMessage": "Cannot poll PR status — GitHub API returned null 10 consecutive times. Check that a valid GitHub token is configured in town settings and that the GitHub API is reachable." }This is confusing because:
null, when in reality no API call was made —resolveGitHubTokenreturnednullandcheckPRStatusshort-circuited before any fetch.Root cause
services/gastown/src/dos/town/town-scm.ts:checkPRStatusreturnsnullfor at least four distinct conditions, which the caller inactions.ts(the poll_pr action handler around line 1030) collapses into a singlepoll_null_countmetric:resolveGitHubToken(ctx)returnednull(no token in town config and no platform integration available) — only logs aconsole.warnat town-scm.ts:70fetchtoapi.github.com/repos/.../pulls/Nreturned a non-OK status (could be 404, 401, 403, 5xx, secondary rate limit) — only logs at town-scm.ts:86response.json()failedGitHubPRStatusSchema.safeParse(json)failed (Zod parse mismatch)All four return
nullindistinguishably. The warn logs are present, but they don't surface to the bead's metadata or to the user.The polecat side uses an entirely different code path (its own container-injected GitHub credential for
git push/ PR creation), so a working polecat does not imply the town worker can resolve a token. They are independent.Why this is hard to diagnose
The user has no signal that points to the actual cause. They see "GitHub API returned null" and a hint to check the token, but:
resolveGitHubTokenactually consultsReproduction
github_token/github_cli_pat/platform_integration_idin its config (or have the platform integration fail). Let polecats use their own container creds.gt_donewill pollcheckPRStatus, getnull10x, and fail with the misleading message.Real-world example: bead
c262038b-f24e-4e21-a89c-7fb3a5f9864fin town98172328-9bd1-4b59-ba3e-0ae627058e6b, rigb6cf4b32-4e1b-4558-a864-a2a8df7bb1de, against PR #3148 — which is OPEN/MERGEABLE/CLEAN and trivially fetchable via a manually authenticated GraphQL query, ruling out actual GitHub API trouble.Suggested fixes
1. Distinguish the null causes in
checkPRStatus(high impact, low effort)Return a discriminated union from
checkPRStatusinstead ofPRStatusResult | null:Then the poll_pr handler in
actions.tscan write the specific error kind into the bead'smetadata.failureReason/failureMessage. Examples:no_token→ "Cannot poll PR status — no GitHub token configured for this town. Add a GitHub PAT or platform integration in town settings."http_error 404→ "PR not found. Was the branch deleted before the PR could be polled?"http_error 401→ "Town's GitHub token is invalid or expired."http_error 403→ "Town's GitHub token lackspull-requests: readpermission for this repo, or hit a secondary rate limit."http_error 5xx→ keep retrying with backoff rather than counting toward null threshold.schema_mismatch→ "GitHub API response shape changed; please file a bug." (and include a few keys for the bug report).2. Don't fail-fast on legitimate transient errors
Currently, any null counts toward the 10-strike threshold, including transient 5xx and rate limits. After the discriminated union split, only
no_token/ 4xx auth errors / repeatedschema_mismatchshould fail-fast; transient issues should reset or use a longer threshold.3. Surface the actionable hint about which token level is consulted
In the failure message, name the resolution chain:
This would have saved the user in the repro above ~30 minutes of confusion since they were looking at polecat container auth, not town config auth.
4. UI: show a "Town GitHub token" health indicator in town settings
When the town has no resolvable GitHub token, show a yellow warning banner: "Polecats can still create PRs (they use their own credentials), but the town cannot poll PR status to land merged work. Configure a token below." This decouples the two concerns visually so users stop assuming the polecat's success implies a healthy town config.
Area
Merge Queue / Refinery
Context
Filed automatically by the Mayor via
gt_report_bug.