Skip to content

Conversation

@github-actions
Copy link
Contributor

Automated Fix by Amber Agent

This PR addresses issue #380 using the Amber background agent.

Changes Summary

  • Action Type: auto-fix
  • Commit: a4f9813
  • Triggered by: Issue label/command

Pre-merge Checklist

  • All linters pass
  • All tests pass
  • Changes follow project conventions (CLAUDE.md)
  • No scope creep beyond issue description

Reviewer Notes

This PR was automatically generated. Please review:

  1. Code quality and adherence to standards
  2. Test coverage for changes
  3. No unintended side effects

🤖 Generated with Amber Background Agent

Closes #380

This commit addresses multiple issues that prevented Amber sessions from
properly timing out and resuming:

1. Increased Job timeout from 4 hours to 24 hours for long-running sessions
   - Changed ActiveDeadlineSeconds from 14400 to 86400 seconds
   - Prevents premature termination of long-running Amber workflows

2. Enhanced SDK session ID annotation update with retry logic
   - Added exponential backoff retry (3 attempts with 1s, 2s, 4s delays)
   - Better error logging to identify annotation persistence failures
   - Critical for session resume functionality

3. Improved cleanup of parent session resources before resume
   - Backend now deletes both temp-content pod AND parent job
   - Prevents PVC multi-attach errors when new session mounts workspace
   - Added 2-second wait for pod deletion to propagate

4. Ensured interactive flag is set on all failure paths
   - Added ensureSessionIsInteractive() calls on timeout/eviction paths
   - Enables session resumption after failures (pod evicted, container error)
   - Previously only called on successful completion

These changes fix issue #380 by ensuring:
- Sessions have sufficient time to complete
- Session resume metadata is reliably captured
- PVC conflicts are avoided during continuation
- Failed sessions can be resumed properly

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@jeremyeder
Copy link
Collaborator

@claude provide critical feedback only.

@jeremyeder
Copy link
Collaborator

dupe

@jeremyeder jeremyeder closed this Dec 4, 2025
jeremyeder added a commit that referenced this pull request Dec 4, 2025
…ber workflow (#388)

## Summary
Add exponential backoff retry pattern to harden PR label addition
against transient GitHub API failures.

## Problem
Amber workflow occasionally fails with `RequestError [HttpError]:
Unexpected end of JSON input, status: 500` when adding labels to PRs.
This is a transient GitHub API error that should be retried.

## Solution
- Add `retryWithBackoff()` helper function with exponential backoff
- 3 retry attempts with 1s, 2s, 4s delays between attempts
- Only retries on retriable errors (HTTP 5xx, JSON parse errors)
- Fails fast on client errors (4xx) since retrying won't help

## Test Plan
- [x] Workflow logic reviewed
- [ ] Test with intentional API failure (manual testing required)
- [ ] Monitor next Amber automation run for successful label addition

## Changes
- Modified `.github/workflows/amber-issue-handler.yml`:
  - Added retry helper function (lines 324-342)
  - Wrapped label addition with retry logic (lines 379-387)

## References
- Related to successful PR #387 (labels failed due to transient API
error)
- Addresses issue #380 follow-up (workflow hardening)

🤖 Generated with Claude Code (https://claude.com/claude-code)

---------

Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Amber] sessions timeout and cant resume

2 participants