Skip to content

Policy automations: add retries for scripts & software #31916

@noahtalerman

Description

@noahtalerman

Goal

User story
As an IT admin,
I want Fleet to retry script runs and software installs up to 3 times by default
so that I don't have to manually retry these scripts/software.

Roadmap item

None.

Original requests

Resources

None.

Changes

Product

  • By default, all script runs and software installs triggered by policy automation are retried up to 3 times.
  • UI changes: No changes
  • CLI (fleetctl) usage changes: No changes
  • YAML changes: No changes
  • REST API changes: No changes
  • Fleet's agent (fleetd) changes: No changes
  • GitOps mode UI changes: No changes
  • GitOps generation changes: No changes
  • Activity changes: No changes
  • Permissions changes: No changes
  • Changes to paid features or tiers: Fleet Premium only. Script and software automations are Fleet Premium only.
  • My device and fleetdm.com/better changes: No changes
  • Usage statistics: No changes
  • Other reference documentation changes: [API reference and guide changes] Policy automations: add retries for scripts & software #37120
  • First draft of test plan added
  • Once shipped, requester has been notified
  • Once shipped, dogfooding issue has been filed

Engineering

ℹ️  Please read this issue carefully and understand it. Pay special attention to UI wireframes, especially "dev notes".

QA

Risk assessment

  • Risk level: Low

Test plan

Make sure to go through the list and consider all events that might be related to this story, so we catch edge cases earlier.

Here’s a shortened, flat, checkbox-style version:


✔️ Policy automation retry test plan (scripts + software)

  • Software installs retry up to 3 times when triggered by a policy automation

    • Set up failing software policy (e.g., 1Password not installed).
    • Trigger policy → confirm fail + install attempt # 1 + install attempt # 2 + install attempt # 3.
    • Confirm that the software install stays pending until the third failure. At that point the software is marked as failed.
  • Script run retries up to 3 times when triggered by a policy automation

    • Set up failing script policy (e.g., file missing).
    • Trigger policy → confirm fail + script attempt # 1 + attempt # 2 + attempt # 3.
    • Confirm that the script stays pending until the third failure. At that point the script is marked as failed.
  • Software stops retrying when it's successful

    • Trigger fail → attempt # 1 is successful
    • Confirm no retries
  • Script stops retrying when it's successful

    • Trigger fail → attempt # 1 is successful
    • Fix condition so policy passes.
    • Confirm no retries
  • Regression check: other automations unaffected

    • Trigger unrelated policy automation.
    • Confirm its behavior is unchanged (not retries).
  • Modify policy and make sure pass/fail counts are reset and retries start over again

  • Modify policy automations and make sure pass/fail counts are reset and retries start over again

Testing notes

Confirmation

  1. Engineer: Added comment to user story confirming successful completion of test plan.
  2. QA: Added comment to user story confirming successful completion of test plan.

Metadata

Metadata

Labels

#g-security-complianceSecurity & Compliance product group:releaseReady to write code. Scheduled in a release. See "Making changes" in handbook.customer-cisneros-lcustomer-hubblecustomer-numacustomer-rembrandtstoryA user story defining an entire feature~assisting g-orchestrationThis is a #g-orchestration issue that another product group is assisting~customer promiseA feature request, or user story for a request, that Fleet has contractually agreed to deliver

Type

No type

Projects

Status

No status

Status

✅ Ready for release

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions