Skip to content

[Gastown] Misleading "GitHub API returned null" error when town has no GitHub token #3149

@jrf0110

Description

@jrf0110

What happened?

Summary

When a town's git_auth config has no GitHub token (none of github_token, github_cli_pat, or platform_integration_id are set), the PR status poller in the merge_request bead lifecycle fails with a misleading error message that suggests the GitHub API itself is the problem.

What the user sees

After 10 consecutive null poll attempts, the merge_request bead is marked failed with this metadata:

{
  "failureReason": "pr_poll_failed",
  "failureMessage": "Cannot poll PR status — GitHub API returned null 10 consecutive times. Check that a valid GitHub token is configured in town settings and that the GitHub API is reachable."
}

This is confusing because:

  • The polecat that produced the PR successfully created the PR using a working credential, so users naturally assume "of course GitHub auth works, the agent just used it."
  • The error implies the GitHub API was contacted and returned null, when in reality no API call was maderesolveGitHubToken returned null and checkPRStatus short-circuited before any fetch.

Root cause

services/gastown/src/dos/town/town-scm.ts:checkPRStatus returns null for at least four distinct conditions, which the caller in actions.ts (the poll_pr action handler around line 1030) collapses into a single poll_null_count metric:

  1. resolveGitHubToken(ctx) returned null (no token in town config and no platform integration available) — only logs a console.warn at town-scm.ts:70
  2. fetch to api.github.com/repos/.../pulls/N returned a non-OK status (could be 404, 401, 403, 5xx, secondary rate limit) — only logs at town-scm.ts:86
  3. response.json() failed
  4. GitHubPRStatusSchema.safeParse(json) failed (Zod parse mismatch)

All four return null indistinguishably. The warn logs are present, but they don't surface to the bead's metadata or to the user.

The polecat side uses an entirely different code path (its own container-injected GitHub credential for git push / PR creation), so a working polecat does not imply the town worker can resolve a token. They are independent.

Why this is hard to diagnose

The user has no signal that points to the actual cause. They see "GitHub API returned null" and a hint to check the token, but:

  • They likely already verified the polecat created the PR successfully → "auth works"
  • The token check in town settings UI may show a token field configured at the org/rig level but not at the town level, and the user can't easily tell which level resolveGitHubToken actually consults
  • There's no way to tell from the bead whether the failure is "no token" vs. "bad token" vs. "GitHub 5xx" vs. "schema drift"

Reproduction

  1. Create a town without setting a github_token / github_cli_pat / platform_integration_id in its config (or have the platform integration fail). Let polecats use their own container creds.
  2. Sling a bead that opens a PR.
  3. The merge_request bead created by gt_done will poll checkPRStatus, get null 10x, and fail with the misleading message.

Real-world example: bead c262038b-f24e-4e21-a89c-7fb3a5f9864f in town 98172328-9bd1-4b59-ba3e-0ae627058e6b, rig b6cf4b32-4e1b-4558-a864-a2a8df7bb1de, against PR #3148 — which is OPEN/MERGEABLE/CLEAN and trivially fetchable via a manually authenticated GraphQL query, ruling out actual GitHub API trouble.

Suggested fixes

1. Distinguish the null causes in checkPRStatus (high impact, low effort)

Return a discriminated union from checkPRStatus instead of PRStatusResult | null:

type PRStatusError =
  | { kind: 'no_token' }
  | { kind: 'http_error'; status: number; statusText: string }
  | { kind: 'invalid_response'; reason: 'json_parse' | 'schema_mismatch' }
  | { kind: 'unrecognized_url' };

type PRStatusOutcome =
  | { ok: true; result: PRStatusResult }
  | { ok: false; error: PRStatusError };

Then the poll_pr handler in actions.ts can write the specific error kind into the bead's metadata.failureReason / failureMessage. Examples:

  • no_token → "Cannot poll PR status — no GitHub token configured for this town. Add a GitHub PAT or platform integration in town settings."
  • http_error 404 → "PR not found. Was the branch deleted before the PR could be polled?"
  • http_error 401 → "Town's GitHub token is invalid or expired."
  • http_error 403 → "Town's GitHub token lacks pull-requests: read permission for this repo, or hit a secondary rate limit."
  • http_error 5xx → keep retrying with backoff rather than counting toward null threshold.
  • schema_mismatch → "GitHub API response shape changed; please file a bug." (and include a few keys for the bug report).

2. Don't fail-fast on legitimate transient errors

Currently, any null counts toward the 10-strike threshold, including transient 5xx and rate limits. After the discriminated union split, only no_token / 4xx auth errors / repeated schema_mismatch should fail-fast; transient issues should reset or use a longer threshold.

3. Surface the actionable hint about which token level is consulted

In the failure message, name the resolution chain:

"No GitHub token resolved. Tried (in order): town git_auth.github_token, town github_cli_pat, town platform integration, rig platform integration. Configure one of these in town or rig settings."

This would have saved the user in the repro above ~30 minutes of confusion since they were looking at polecat container auth, not town config auth.

4. UI: show a "Town GitHub token" health indicator in town settings

When the town has no resolvable GitHub token, show a yellow warning banner: "Polecats can still create PRs (they use their own credentials), but the town cannot poll PR status to land merged work. Configure a token below." This decouples the two concerns visually so users stop assuming the polecat's success implies a healthy town config.

Area

Merge Queue / Refinery

Context

  • Town ID: 98172328-9bd1-4b59-ba3e-0ae627058e6b
  • Agent: Mayor (0c952401-2aaa-4335-bee2-35036e90483c)
  • Rig ID: b6cf4b32-4e1b-4558-a864-a2a8df7bb1de

Filed automatically by the Mayor via gt_report_bug.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinggt:mayorMayor agent, chat interface, delegation tools

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions