Add restart retry limits to on-failure restart policy#42
Conversation
Extend the on-failure restart policy with two new fields to prevent infinite restart loops: - A maximum number of restart attempts before the service gives up - A duration after which the failure counter resets, allowing retries to start fresh if the service has been running successfully This is implemented for the WinSW backend, which maps the retry limit to multiple <onfailure> XML elements (with a final action="none" to stop) and the reset duration to the <resetfailure> element. Other service manager backends (systemd, launchd, sc, OpenRC, rc.d) have TODO placeholders for these fields and could be extended in the future. For example, systemd could map to StartLimitBurst and StartLimitIntervalSec. A system test is included that verifies a failing Windows service stops after exhausting its retry limit, using a new `fail` subcommand in the test binary that simulates a crashing service. BREAKING CHANGE: The on-failure restart policy now has additional fields. Code that constructs or destructures this variant must be updated to include the new fields (use None for previous behavior). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Hey @chipsenkbeil I'm marking this PR as a work in progress until I ensure the new test passes. I was able to add quite an elaborate system test with the help of Claude, but it may take a few iterations to get it to pass. Right now I'm only providing this feature for Windows because that is the platform I'm concerned about, but we could extend to support other platforms. |
|
@jacderida sounds good. I'm just now trying out claude and opencode and am pretty blown away by the effectiveness and speed. So looking forward to what you produce here! |
Cool. Honestly, if you spend a bit of time getting a good Claude Code setup, the productivity gains are unbelievable, and it's a lot of fun to work with. Set yourself up with a custom skill that interviews you to plan out every piece of work. With adequate planning and control, the results are extremely impressive and you can just churn through features. Have fun! :) |
|
Hey @chipsenkbeil This appears to be working OK, so I took the "WIP" out of the title. The automated test passes and I also tested it in the context of my own application. If you could approve the PR, I will get it merged and released. Thanks. |
|
Hey @chipsenkbeil I hope you don't mind, but I'm going to merge this. I need it for a release we are doing very soon. |
|
@jacderida yep, all good. :) I'm on vacation this week and not checking often. You've got access to merge and publish so that I'm not a blocker! I trust you :D |
Awesome, thanks! I got it merged and released. |
Summary
RestartPolicy::OnFailurewithmax_retriesandreset_after_secsfields to preventinfinite restart loops using the generic cross-platform API
<onfailure>XML elements based onmax_retries, with a finalaction="none"to stop after exhausting retriesimplementation
Breaking change:
RestartPolicy::OnFailurenow requires two additional fields. UseNoneforboth to preserve previous behavior.
Test plan
compatibility, and precedence of WinSW-specific config
failsubcommand for simulating crashing servicesshould_stop_winsw_service_after_max_retries) — needs CI run toverify on Windows
🤖 Generated with Claude Code