Skip to content

Add restart retry limits to on-failure restart policy#42

Merged
jacderida merged 2 commits intochipsenkbeil:mainfrom
jacderida:feat-avoid_infinite_restart_on_windows
Feb 18, 2026
Merged

Add restart retry limits to on-failure restart policy#42
jacderida merged 2 commits intochipsenkbeil:mainfrom
jacderida:feat-avoid_infinite_restart_on_windows

Conversation

@jacderida
Copy link
Collaborator

@jacderida jacderida commented Feb 12, 2026

Summary

  • Extends RestartPolicy::OnFailure with max_retries and reset_after_secs fields to prevent
    infinite restart loops using the generic cross-platform API
  • Implements the feature for WinSW: generates multiple <onfailure> XML elements based on
    max_retries, with a final action="none" to stop after exhausting retries
  • Other backends (systemd, launchd, sc, OpenRC, rc.d) have TODO placeholders for future
    implementation

Breaking change: RestartPolicy::OnFailure now requires two additional fields. Use None for
both to preserve previous behavior.

Test plan

  • All 15 existing unit tests pass
  • 6 new WinSW unit tests covering max_retries, reset_after_secs, combined usage, backwards
    compatibility, and precedence of WinSW-specific config
  • 8 doc-tests pass
  • System test binary extended with fail subcommand for simulating crashing services
  • New Windows system test (should_stop_winsw_service_after_max_retries) — needs CI run to
    verify on Windows

🤖 Generated with Claude Code

jacderida and others added 2 commits February 12, 2026 00:00
Extend the on-failure restart policy with two new fields to prevent
infinite restart loops:

- A maximum number of restart attempts before the service gives up
- A duration after which the failure counter resets, allowing retries
  to start fresh if the service has been running successfully

This is implemented for the WinSW backend, which maps the retry limit
to multiple <onfailure> XML elements (with a final action="none" to
stop) and the reset duration to the <resetfailure> element. Other
service manager backends (systemd, launchd, sc, OpenRC, rc.d) have
TODO placeholders for these fields and could be extended in the future.
For example, systemd could map to StartLimitBurst and
StartLimitIntervalSec.

A system test is included that verifies a failing Windows service stops
after exhausting its retry limit, using a new `fail` subcommand in the
test binary that simulates a crashing service.

BREAKING CHANGE: The on-failure restart policy now has additional
fields. Code that constructs or destructures this variant must be
updated to include the new fields (use None for previous behavior).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@jacderida
Copy link
Collaborator Author

jacderida commented Feb 12, 2026

Hey @chipsenkbeil

I'm marking this PR as a work in progress until I ensure the new test passes.

I was able to add quite an elaborate system test with the help of Claude, but it may take a few iterations to get it to pass.

Right now I'm only providing this feature for Windows because that is the platform I'm concerned about, but we could extend to support other platforms.

@chipsenkbeil
Copy link
Owner

@jacderida sounds good. I'm just now trying out claude and opencode and am pretty blown away by the effectiveness and speed. So looking forward to what you produce here!

@jacderida
Copy link
Collaborator Author

@jacderida sounds good. I'm just now trying out claude and opencode and am pretty blown away by the effectiveness and speed. So looking forward to what you produce here!

Cool.

Honestly, if you spend a bit of time getting a good Claude Code setup, the productivity gains are unbelievable, and it's a lot of fun to work with. Set yourself up with a custom skill that interviews you to plan out every piece of work. With adequate planning and control, the results are extremely impressive and you can just churn through features. Have fun! :)

@jacderida jacderida changed the title Add restart retry limits to on-failure restart policy [WIP] Add restart retry limits to on-failure restart policy Feb 17, 2026
@jacderida
Copy link
Collaborator Author

Hey @chipsenkbeil

This appears to be working OK, so I took the "WIP" out of the title.

The automated test passes and I also tested it in the context of my own application.

If you could approve the PR, I will get it merged and released.

Thanks.

@jacderida
Copy link
Collaborator Author

Hey @chipsenkbeil

I hope you don't mind, but I'm going to merge this.

I need it for a release we are doing very soon.

@jacderida jacderida merged commit d6bca3f into chipsenkbeil:main Feb 18, 2026
12 of 14 checks passed
@chipsenkbeil
Copy link
Owner

@jacderida yep, all good. :) I'm on vacation this week and not checking often. You've got access to merge and publish so that I'm not a blocker! I trust you :D

@jacderida
Copy link
Collaborator Author

@jacderida yep, all good. :) I'm on vacation this week and not checking often. You've got access to merge and publish so that I'm not a blocker! I trust you :D

Awesome, thanks! I got it merged and released.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants