Agent workflow recovery based on server-side state by hhamalai · Pull Request #6077 · woodpecker-ci/woodpecker

hhamalai · 2026-02-06T15:39:11Z

This PR introduces a workflow recovery mechanism for agents. It allows workflow to resume from their last known state after an agent restart by persisting workflow progress in server database.

why

For larger deployments the agents must be occasionally updated/scaled, which currently causes all CI jobs to be interrupted as agents keep the workflow execution config in memory, lost during restarts, which causes headache especially with long running / critical workflows.

how

This PR introduces a bookkeeping mechanism to maintain a record of the workflow's progress. This bookkeeping is done in server database, and agents are querying step statuses from server, allowing an agent to identify which steps are pending, running, or completed. If the executing agent is lost, the workflow becomes available from the server queue and new agent can continue the workflow execution. This enables the pipeline to resume correctly following an agent restart or failure.

what else

Originally proposed in feat: workflow recovery for Kubernetes backend agents #5930 to target only kubernetes backend, as discussed there the recovery state / state bookkeeping was desired to be persisted in server database, and not to be bound to backend implemantations. With this, backends can implement an interface if they support recovery.
The recovered workflows might produce double logging visible on UI (original agent streams logs until it's deleted, the new agent taking over the workflow management will stream the same logs from the beginning). At no circumstances should the same step be executed twice.

server/store/datastore/migration/027_add_recovery_state.go

pipeline/runtime/recovery.go

cmd/server/flags.go

pipeline/backend/types/backend.go

pipeline/recovery.go

agent/runner.go

…statuses

…ager

hhamalai · 2026-02-10T11:44:27Z

rebased commits from main

pipeline/backend/local/local.go

cmd/server/server.go

agent/runner.go

woodpecker-bot · 2026-02-10T23:28:15Z

Surge PR preview deployment succeeded. View it at https://woodpecker-ci-woodpecker-pr-6077.surge.sh

qwerty287 · 2026-02-12T09:08:16Z

@hhamalai could you check out linting and openapi: https://ci.woodpecker-ci.org/repos/3780/pipeline/31537/35

Otherwise this looks quite good to me, besides the mentioned style discussion.

fixes depguard related linter errors

codecov · 2026-02-12T14:27:21Z

Codecov Report

❌ Patch coverage is 0.55556% with 716 lines in your changes missing coverage. Please review.
✅ Project coverage is 30.93%. Comparing base (2ca6f58) to head (e1e434f).
⚠️ Report is 12 commits behind head on main.

Files with missing lines	Patch %	Lines
rpc/proto/woodpecker.pb.go	0.00%	148 Missing ⚠️
pipeline/pipeline.go	0.00%	115 Missing ⚠️
pipeline/recovery.go	0.00%	72 Missing ⚠️
agent/rpc/client_grpc.go	0.00%	67 Missing ⚠️
rpc/proto/woodpecker_grpc.pb.go	0.00%	52 Missing ⚠️
server/rpc/rpc.go	0.00%	41 Missing ⚠️
pipeline/backend/docker/docker.go	0.00%	39 Missing ⚠️
server/store/datastore/recovery_state.go	0.00%	37 Missing ⚠️
pipeline/backend/kubernetes/kubernetes.go	0.00%	36 Missing ⚠️
server/rpc/server.go	0.00%	26 Missing ⚠️
... and 11 more

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #6077      +/-   ##
==========================================
- Coverage   31.66%   30.93%   -0.73%     
==========================================
  Files         420      423       +3     
  Lines       28413    29089     +676     
==========================================
+ Hits         8996     9000       +4     
- Misses      18596    19265     +669     
- Partials      821      824       +3

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

pipeline/backend/kubernetes/volume.go

agent/runner.go

pipeline/backend/types/backend.go

pipeline/runtime/recovery.go

qwerty287 · 2026-02-24T08:54:04Z

Thanks, as this changes the rpc the version needs to be increased: https://github.com/woodpecker-ci/woodpecker/blob/main/rpc/proto/version.go#L19

qwerty287 · 2026-02-24T10:15:20Z

Thanks, looks good to me now. @6543 you want to check again?

6543 · 2026-02-25T00:18:56Z

I'm not fully convinced of the new proto type added let me rethink and check before i can make a based review

qwerty287 reviewed Feb 9, 2026

View reviewed changes

hhamalai added 5 commits February 10, 2026 13:04

feat: agent workflow recovery based on server-side state

2322a74

refactor: use InitWorkflowRecovery also to get the workflow recovery …

36c377d

…statuses

refactor: remove unnecessary migration

51b3aa5

docs: add recovery option

8153cf3

refactor: add reconnect as a backend field, always create RecoveryMan…

d1aa97e

…ager

hhamalai force-pushed the recovery-feature branch from bb6faaa to d1aa97e Compare February 10, 2026 11:43

qwerty287 reviewed Feb 10, 2026

View reviewed changes

pipeline/backend/local/local.go Outdated Show resolved Hide resolved

qwerty287 reviewed Feb 10, 2026

View reviewed changes

cmd/server/server.go Outdated Show resolved Hide resolved

agent/runner.go Outdated Show resolved Hide resolved

refactor: add error types and remove unnecessary config field

bd3b3ff

hhamalai and others added 3 commits February 12, 2026 15:09

refactor: move recovery related types & consts to a dedicated package

864c4b5

fixes depguard related linter errors

Merge branch 'main' into recovery-feature

3f676b9

refactor: fix openapi checks

e1e434f

hhamalai commented Feb 13, 2026

View reviewed changes

pipeline/backend/kubernetes/volume.go Show resolved Hide resolved

qwerty287 requested a review from a team February 13, 2026 09:30

refactor: add workflow recovery related unittests

3e88cd8

hhamalai force-pushed the recovery-feature branch 2 times, most recently from 0d845b6 to 3e88cd8 Compare February 16, 2026 11:15

Merge remote-tracking branch 'upstream/main' into recovery-feature

fd0b239

6543 added the feature add new functionality label Feb 17, 2026

6543 reviewed Feb 17, 2026

View reviewed changes

agent/runner.go Outdated Show resolved Hide resolved

6543 reviewed Feb 17, 2026

View reviewed changes

pipeline/backend/types/backend.go Outdated Show resolved Hide resolved

6543 reviewed Feb 17, 2026

View reviewed changes

pipeline/runtime/recovery.go Show resolved Hide resolved

hhamalai and others added 3 commits February 20, 2026 11:07

refactor: move recovery related features to package/runtime subpackage

4922a30

Merge branch 'main' into recovery-feature

c552267

Merge branch 'main' into recovery-feature

bf6881e

xoxys changed the title ~~feat: agent workflow recovery based on server-side state~~ Agent workflow recovery based on server-side state Feb 22, 2026

xoxys added server backend new backend agent and removed backend new backend labels Feb 22, 2026

refactor: get recovery-enabled status on agent registration

41f94da

hhamalai and others added 2 commits February 24, 2026 11:05

refactor: update rpc proto version, fix linting issues

958ef46

Merge branch 'main' into recovery-feature

1bdc01f

Uh oh!

Conversation

hhamalai commented Feb 6, 2026

why

how

what else

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hhamalai commented Feb 10, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

woodpecker-bot commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

qwerty287 commented Feb 12, 2026

Uh oh!

codecov bot commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

qwerty287 commented Feb 24, 2026

Uh oh!

qwerty287 commented Feb 24, 2026

Uh oh!

6543 commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

woodpecker-bot commented Feb 10, 2026 •

edited

Loading

codecov bot commented Feb 12, 2026 •

edited

Loading