Add retry logic for server creation with configurable attempts and delay (fixes #8) #10

xgboosted · 2025-05-15T08:19:53Z

This PR introduces built-in, configurable retry logic to the runner creation step in the Hetzner GitHub Action. By allowing users to specify the number of retry attempts and the delay between attempts, the goal is to improve robustness against transient API/network errors or mitigate temporary resource unavailability issue.

Key Changes

1. New Inputs in `action.yml`

create_retries: Number of retry attempts for runner creation (default: 1).
create_retry_delay: Delay (in seconds) between attempts (default: 10).
Both are passed as environment variables to the shell script.

2. Retry Logic in `action.sh`

Parses the new environment variables and validates them as integers.
Wraps the Hetzner Cloud server creation (curl -X POST ...) in a retry loop:
- Logs each attempt and error.
- Retries up to the configured limit, waiting the specified delay between attempts.
- Exits with a clear error message if all attempts fail.
Only applies to the server creation step (not deletion or unrelated steps).

3. Documentation in `README.md`

Inputs table updated to include create_retries and create_retry_delay.
New section describes the retry logic, its purpose, and usage.

Example Usage

with:
  mode: create
  github_token: ${{ secrets.PERSONAL_ACCESS_TOKEN }}
  hcloud_token: ${{ secrets.HCLOUD_TOKEN }}
  create_retries: 3
  create_retry_delay: 15

Motivation

Addresses transient failures in Hetzner API or network connectivity.
Reduces manual intervention for temporary issues.
Provides clear feedback and logging for troubleshooting.

Style & Compatibility

All new code follows project conventions: uppercase MY_ variables, lowercase functions, tab indentation, and preservation of comments.
No breaking changes; defaults maintain previous behavior.

Closes: #8

Introduce create_retries and create_retry_delay inputs to control the number of retry attempts and delay between attempts when creating a Hetzner Cloud server. Update action.sh to implement the retry loop and improve error handling for transient failures. Document new options in README.md and action.yml to enhance reliability.

Cyclenerd · 2025-05-16T10:11:03Z

Ok, thanks for the new revised pull. Now I can read and understand it much better. In curl there is a built-in retry. Wouldn't that be enough? Then we could avoid the extra loop.

Documentation: https://everything.curl.dev/usingcurl/downloads/retry.html

Example:

curl --retry 12 --retry-all-errors "https://api.hetzner.cloud/v1/servers"

xgboosted · 2025-05-16T12:16:16Z

The --retry feature of the curl command (as described at https://everything.curl.dev/usingcurl/downloads/retry.html) cannot always be used in CI/CD shell scripts for these reasons:

Portability: Not all environments have a curl version that supports --retry. Many CI runners or minimal Linux images ship with older curl versions lacking this flag, so relying on it can break cross-platform compatibility.
Control Over Retry Logic: The built-in --retry flag only retries on network errors or certain HTTP codes, and does not allow fine-grained control over which failures to retry, how to handle output, or custom logging per attempt. Custom shell logic allows for more detailed error handling, logging, and integration with other script steps.
Consistent Logging and Output: With a manual retry loop, you can log each attempt, capture and inspect output, and decide exactly what to do on each failure. This is important for debugging and for providing clear feedback in CI logs.
Project Coding Standards: Some projects require all retry logic to be explicit in the script for auditability, readability, and to ensure all error handling is visible and testable.

Summary:
While curl --retry is convenient, custom retry logic in shell scripts is more portable, auditable, and flexible—especially in environments where you can't guarantee curl's version or want detailed control over error handling and logging.

Cyclenerd · 2025-05-16T16:00:49Z

I have a question about the curl version and the retry mechanism. I'm trying to understand the reasoning behind the suggested changes fully.

Since the script runs in a GitHub Actions runner with a standard image, are we concerned about the curl version not supporting --retry? The --retry flag has been around for a very long time (since curl 7.12.3 https://curl.se/ch/7.12.3.html), and I suspect it's included in all curl versions used in the GitHub Actions environment.

Also, I'm wondering if retrying on all non-zero exit codes is the best approach. For example, if the API returns a 401, should we really keep retrying? curl --retry has built-in logic to avoid retrying on certain error codes, which seems more robust. My understanding is that the original issue was about network errors, and curl --retry is specifically designed to handle those. Could retrying on everything potentially mask underlying issues or cause unnecessary load? Maybe your AI miss that?

xgboosted · 2025-05-16T16:36:41Z

My understanding is that the original issue was about network errors

The main issue is "resource unavailability," where Cloud Servers are unavailable because of resource exhaustion. This has happened since last year, and I face it every day from 10 a.m. to 12 p.m. German time. When I tested the branch by releasing the Action to the marketplace (that is why I had to delete the last word "Cloud" once in the prior PR, as duplicate Actions are not allowed), I had set 50 tries, and the Cloud server was created on the 27th try (see the screenshot).

Hetzner's notice on this issue since last year: https://status.hetzner.com/incident/aa5ce33b-faa5-4fd0-9782-fde43cd270cf

xgboosted · 2025-05-17T15:42:55Z

@Cyclenerd thanks for accepting the PR.

I am guessing that only one variable was added named create_wait which is a value representing total number of tries, where the wait time interval is hardcoded in the action.sh as 10 seconds? So, create_wait = 20 means retrying for 200 seconds with 10 seconds interval.

Did I get it right?

I think an update to the example template in the readme.md would be helpful :)

Cyclenerd · 2025-05-17T15:50:53Z

Yes:

20 * 10 = 200 sec

Default:

360 * 10 = 3600 / 1 hour

Cyclenerd · 2025-05-17T15:52:01Z

In my opinion, no further help than the description and explanation in the table is necessary. In the example, it is likely to confuse many because it only affects a very small number of people.

xgboosted · 2025-05-18T12:05:51Z

This is how I have defined the new parameter. Is this correct?

   steps:
      - name: Create runner
        id: create-runner
        uses: Cyclenerd/hcloud-github-runner@v1.1.0
        with:
          mode: create
          github_token: ${{ ----------- }}
          hcloud_token: ${{  -----------  }}
          server_type: ${{  -----------  }}
          location: ${{  -----------  }}
          image: ${{  -----------  }}
          name: ${{  -----------  }}
          create_wait: 100
        continue-on-error: false

Cyclenerd added 4 commits May 17, 2025 16:25

Update action.sh

835faba

Update action.yml

d160664

Update action.yml

be6f3a8

Update README.md

89cd00e

Cyclenerd merged commit 480db35 into Cyclenerd:master May 17, 2025
1 check passed

xgboosted deleted the feature-create-rerty-logic branch June 4, 2025 14:40

Cyclenerd mentioned this pull request Jun 4, 2025

Add retry logic for Hetzner server deletion (resolves #11) #12

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add retry logic for server creation with configurable attempts and delay (fixes #8) #10

Add retry logic for server creation with configurable attempts and delay (fixes #8) #10

Uh oh!

xgboosted commented May 15, 2025

Uh oh!

Cyclenerd commented May 16, 2025

Uh oh!

xgboosted commented May 16, 2025

Uh oh!

Cyclenerd commented May 16, 2025

Uh oh!

xgboosted commented May 16, 2025

Uh oh!

Uh oh!

xgboosted commented May 17, 2025

Uh oh!

Cyclenerd commented May 17, 2025

Uh oh!

Cyclenerd commented May 17, 2025

Uh oh!

xgboosted commented May 18, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Add retry logic for server creation with configurable attempts and delay (fixes #8) #10

Add retry logic for server creation with configurable attempts and delay (fixes #8) #10

Uh oh!

Conversation

xgboosted commented May 15, 2025

Key Changes

1. New Inputs in action.yml

2. Retry Logic in action.sh

3. Documentation in README.md

Example Usage

Motivation

Style & Compatibility

Uh oh!

Cyclenerd commented May 16, 2025

Uh oh!

xgboosted commented May 16, 2025

Uh oh!

Cyclenerd commented May 16, 2025

Uh oh!

xgboosted commented May 16, 2025

Uh oh!

Uh oh!

xgboosted commented May 17, 2025

Uh oh!

Cyclenerd commented May 17, 2025

Uh oh!

Cyclenerd commented May 17, 2025

Uh oh!

xgboosted commented May 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

1. New Inputs in `action.yml`

2. Retry Logic in `action.sh`

3. Documentation in `README.md`

xgboosted commented May 18, 2025 •

edited

Loading