Skip to content

Comments

Update foliage (curl retry backoff + download concurrency limit)#1250

Merged
angerman merged 1 commit intomainfrom
test/foliage-retry-backoff
Feb 12, 2026
Merged

Update foliage (curl retry backoff + download concurrency limit)#1250
angerman merged 1 commit intomainfrom
test/foliage-retry-backoff

Conversation

@angerman
Copy link
Contributor

@angerman angerman commented Feb 11, 2026

Summary

Updates the foliage flake input to include input-output-hk/foliage#116:

  • curl --retry 3 --retry-connrefused — retries transient HTTP errors (408, 429, 500, 502, 503, 504) with exponential backoff (1s, 2s, 4s)
  • Download concurrency cap at 20 via Shake Resource — prevents hundreds of simultaneous curl processes from overwhelming GitHub when running with -j 0
  • CI actions upgradednix-installer-action v9→v21, magic-nix-cache-action v2→v13, cachix-action v14→v16

This addresses the repeated transient HTTP 502 failures seen in #1248 (failed 4 times before succeeding on the 5th manual re-run, each time on a different package).

@neilmayhew
Copy link
Contributor

neilmayhew commented Feb 11, 2026

There are no messages from curl in the logs about http status errors. I don't know whether that's because there were no errors this time, or because curl doesn't output a message if it's able to fetch the URL on a retry.

I couldn't find an accurate way to test this locally, but my impression is that curl will output an error message for each failure even if it succeeds on a subsequent retry, and that means no 502s were encountered on this run.

It may be that the Shake-based rate-limiting mechanism was sufficient to avoid getting the 502s in the first place, though.

This test shows that nothing is fundamentally broken by the changes, so I'm in favour of making the upgrade and seeing how things go. I think we should keep an eye on overall run time for the repo builds and then tweak the Shake resource limit if necessary. When we get the Cabal-file rewriting issue sorted out, and can safely delete the cache, we should try a non-cached run to see how long that takes with this change.

@neilmayhew
Copy link
Contributor

Looking at the run times before and after we re-enabled the cache, I don't think caching provides any improvement. It takes as long to fetch the cache as it does to fetch the tarballs from upstream.

… limit

Updates the foliage flake input to include input-output-hk/foliage#116:

- curl --retry 3 --retry-connrefused for transient HTTP errors (502, etc.)
  with exponential backoff
- Download concurrency capped at 20 via Shake Resource to prevent
  overwhelming GitHub under -j 0
- CI actions upgraded (nix-installer v21, magic-nix-cache v13, cachix v16)
@angerman angerman force-pushed the test/foliage-retry-backoff branch from fccf0a1 to ef98a62 Compare February 12, 2026 00:52
@angerman angerman changed the title Test foliage with curl retry backoff and download concurrency limit Update foliage (curl retry backoff + download concurrency limit) Feb 12, 2026
@angerman
Copy link
Contributor Author

The build-repo failure is the exact HTTP 502 problem this PR fixes — but it occurs in the "Build repository (main)" step, which uses main's foliage (without the retry fix). The PR tip's foliage has the fix but CI never reaches that step.

This is a chicken-and-egg: the CI baseline build always uses main's foliage. Re-running until the transient 502s don't hit. The previous CHaP run (#1249) confirmed the fix works — build-repo passed when using the new foliage.

@angerman angerman enabled auto-merge (rebase) February 12, 2026 01:12
@angerman angerman merged commit 4b2ed92 into main Feb 12, 2026
20 of 22 checks passed
@angerman angerman deleted the test/foliage-retry-backoff branch February 12, 2026 01:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants