Testing things: Add retry with backoff utility and add tests#471
Testing things: Add retry with backoff utility and add tests#471goanpeca wants to merge 1 commit intoconda-incubator:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds a reusable retry-with-exponential-backoff utility and wires it into network-prone conda/mamba operations (and installer downloads) to mitigate transient Anaconda connectivity failures (Fixes #129).
Changes:
- Introduce
retryWithBackoff+isRetryableErrorutility with configurable backoff behavior. - Add retry configuration/constants (retryable error patterns, retry counts/delays, network subcommands) and apply retries to conda commands and installer downloads.
- Add Vitest coverage threshold tweaks and new unit tests for the retry utility; regenerate
dist/bundles.
Reviewed changes
Copilot reviewed 6 out of 8 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| vitest.config.ts | Adjusts coverage thresholds to reflect new code/tests. |
| src/retry.ts | Adds retry-with-backoff utility and default retryable-error matcher. |
| src/constants.ts | Adds retry-related constants (patterns, retry defaults, network subcommands). |
| src/installer/base.ts | Wraps tool-cache downloads with retry. |
| src/conda.ts | Retries selected conda subcommands via retryWithBackoff. |
| src/tests/retry.test.ts | Adds unit tests for retry behavior. |
| dist/setup/index.js | Re-bundled action output including retry changes. |
| dist/delete/index.js | Re-bundled action output including constants changes. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| const subcommand = cmd[0]; | ||
| if (constants.NETWORK_SUBCOMMANDS.includes(subcommand)) { | ||
| return await retryWithBackoff( | ||
| () => utils.execute(command, env, captureOutput), | ||
| { | ||
| maxRetries: constants.CONDA_RETRY_MAX, | ||
| initialDelayMs: constants.CONDA_RETRY_INITIAL_DELAY_MS, | ||
| }, | ||
| ); |
There was a problem hiding this comment.
The retry predicate (isRetryableError) only inspects Error.message, but utils.execute() throws generic errors like " return error code " and logs the underlying CondaHTTPError/connection details to stderr via core.warning without including them in the thrown error. As a result, these retries will almost never trigger for the transient network failures this PR targets. Consider capturing stderr (or combined output) in the thrown error used for retries, or wrapping utils.execute here to surface the relevant stderr text into the error that retryWithBackoff evaluates.
| /CondaHTTPError/i, | ||
| /ConnectionError/i, | ||
| /SSLError/i, | ||
| /HTTP [45]\d{2}/i, |
There was a problem hiding this comment.
RETRYABLE_ERROR_PATTERNS currently treats all HTTP 4xx as retryable via /HTTP [45]\d{2}/i. Retrying on 4xx (e.g., 401/403 auth issues, 404 missing channel/repodata) is typically non-transient and can add 10s+ of unnecessary delay before failing. Consider narrowing this to statuses that are commonly transient (e.g., 5xx, 408, 429) and/or matching Conda's specific "HTTP 000"/connection-failed cases separately.
| /HTTP [45]\d{2}/i, | |
| /HTTP (5\d{2}|408|429)/i, |
| let lastError: Error | undefined; | ||
|
|
||
| for (let attempt = 0; attempt <= maxRetries; attempt++) { | ||
| try { | ||
| return await fn(); | ||
| } catch (err) { | ||
| lastError = err instanceof Error ? err : new Error(String(err)); | ||
|
|
||
| const isLastAttempt = attempt === maxRetries; | ||
| if (isLastAttempt || !isRetryableFn(lastError)) { | ||
| throw lastError; | ||
| } | ||
|
|
||
| const delay = Math.min( | ||
| initialDelayMs * Math.pow(backoffFactor, attempt), | ||
| maxDelayMs, | ||
| ); | ||
| core.warning( | ||
| `Attempt ${attempt + 1}/${maxRetries + 1} failed: ${lastError.message}`, | ||
| ); | ||
| core.info(`Retrying in ${delay / 1000}s...`); | ||
| await sleep(delay); | ||
| } | ||
| } | ||
|
|
||
| throw lastError; |
There was a problem hiding this comment.
retryWithBackoff allows maxRetries (and other numeric options) to be negative or non-finite. If maxRetries is negative, the loop never runs and lastError remains undefined, leading to throw lastError throwing undefined (hard to debug). Consider validating/clamping maxRetries (and delays/backoffFactor) to sane non-negative finite values up front, and ensure the final throw is always an Error instance.
| let lastError: Error | undefined; | |
| for (let attempt = 0; attempt <= maxRetries; attempt++) { | |
| try { | |
| return await fn(); | |
| } catch (err) { | |
| lastError = err instanceof Error ? err : new Error(String(err)); | |
| const isLastAttempt = attempt === maxRetries; | |
| if (isLastAttempt || !isRetryableFn(lastError)) { | |
| throw lastError; | |
| } | |
| const delay = Math.min( | |
| initialDelayMs * Math.pow(backoffFactor, attempt), | |
| maxDelayMs, | |
| ); | |
| core.warning( | |
| `Attempt ${attempt + 1}/${maxRetries + 1} failed: ${lastError.message}`, | |
| ); | |
| core.info(`Retrying in ${delay / 1000}s...`); | |
| await sleep(delay); | |
| } | |
| } | |
| throw lastError; | |
| const safeMaxRetries = | |
| Number.isFinite(maxRetries) && maxRetries >= 0 | |
| ? Math.floor(maxRetries) | |
| : 0; | |
| const safeInitialDelayMs = | |
| Number.isFinite(initialDelayMs) && initialDelayMs >= 0 | |
| ? initialDelayMs | |
| : 10_000; | |
| const safeBackoffFactor = | |
| Number.isFinite(backoffFactor) && backoffFactor >= 1 | |
| ? backoffFactor | |
| : 2; | |
| const safeMaxDelayMs = | |
| Number.isFinite(maxDelayMs) && maxDelayMs >= 0 ? maxDelayMs : 60_000; | |
| let lastError: Error | undefined; | |
| for (let attempt = 0; attempt <= safeMaxRetries; attempt++) { | |
| try { | |
| return await fn(); | |
| } catch (err) { | |
| lastError = err instanceof Error ? err : new Error(String(err)); | |
| const isLastAttempt = attempt === safeMaxRetries; | |
| if (isLastAttempt || !isRetryableFn(lastError)) { | |
| throw lastError; | |
| } | |
| const delay = Math.min( | |
| safeInitialDelayMs * Math.pow(safeBackoffFactor, attempt), | |
| safeMaxDelayMs, | |
| ); | |
| core.warning( | |
| `Attempt ${attempt + 1}/${safeMaxRetries + 1} failed: ${lastError.message}`, | |
| ); | |
| core.info(`Retrying in ${delay / 1000}s...`); | |
| await sleep(delay); | |
| } | |
| } | |
| throw lastError ?? new Error("retryWithBackoff failed without a captured error"); |
| if (executablePath === "") { | ||
| const rawDownloadPath = await tc.downloadTool(options.url); | ||
| const rawDownloadPath = await retryWithBackoff( | ||
| () => tc.downloadTool(options.url), | ||
| { | ||
| maxRetries: constants.CONDA_RETRY_MAX, | ||
| initialDelayMs: constants.CONDA_RETRY_INITIAL_DELAY_MS, | ||
| }, | ||
| ); |
There was a problem hiding this comment.
ensureLocalInstaller uses retryWithBackoff with the default isRetryableError, which is documented/implemented around Conda stderr patterns. @actions/tool-cache download failures may surface different messages (e.g., Node/TLS error codes) and may not match those patterns, making this retry ineffective or inconsistent. Consider passing a retry predicate tailored to tc.downloadTool failures (or defining a separate pattern set for generic network download errors).
| import { describe, it, expect, vi, beforeEach, afterEach } from "vitest"; | ||
|
|
||
| import { retryWithBackoff } from "../retry"; | ||
|
|
||
| // Mock @actions/core | ||
| vi.mock("@actions/core", () => ({ | ||
| info: vi.fn(), | ||
| warning: vi.fn(), | ||
| })); | ||
|
|
||
| describe("retryWithBackoff", () => { |
There was a problem hiding this comment.
The @actions/core mock is declared after importing retryWithBackoff. In this repo’s other tests (e.g. conda.test.ts), modules are imported after mocks to ensure ESM mocks apply. To avoid the real @actions/core being loaded before the mock, move the retryWithBackoff import below vi.mock(...) or switch to await import("../retry") inside the tests.
| import { describe, it, expect, vi, beforeEach, afterEach } from "vitest"; | |
| import { retryWithBackoff } from "../retry"; | |
| // Mock @actions/core | |
| vi.mock("@actions/core", () => ({ | |
| info: vi.fn(), | |
| warning: vi.fn(), | |
| })); | |
| describe("retryWithBackoff", () => { | |
| import { describe, it, expect, vi, beforeEach, afterEach, beforeAll } from "vitest"; | |
| // Mock @actions/core | |
| vi.mock("@actions/core", () => ({ | |
| info: vi.fn(), | |
| warning: vi.fn(), | |
| })); | |
| let retryWithBackoff: typeof import("../retry").retryWithBackoff; | |
| describe("retryWithBackoff", () => { | |
| beforeAll(async () => { | |
| ({ retryWithBackoff } = await import("../retry")); | |
| }); |
| vi.useRealTimers(); | ||
|
|
||
| const fn = vi | ||
| .fn() | ||
| .mockRejectedValueOnce(new Error("CondaHTTPError: HTTP 503")) | ||
| .mockRejectedValueOnce(new Error("CondaHTTPError: HTTP 503")) | ||
| .mockRejectedValueOnce(new Error("CondaHTTPError: HTTP 503")); | ||
|
|
||
| await expect( | ||
| retryWithBackoff(fn, { | ||
| maxRetries: 2, | ||
| initialDelayMs: 10, | ||
| backoffFactor: 2, | ||
| }), | ||
| ).rejects.toThrow("CondaHTTPError: HTTP 503"); | ||
|
|
||
| // 1 initial + 2 retries = 3 total | ||
| expect(fn).toHaveBeenCalledTimes(3); | ||
|
|
||
| vi.useFakeTimers(); |
There was a problem hiding this comment.
This test switches to real timers (vi.useRealTimers()), which adds real delays (initialDelay/backoff) to the test run and can introduce flakiness. Since the retry logic is timer-driven, it should be possible to keep fake timers and advanceTimersByTimeAsync to cover the same behavior deterministically (including advancing the full backoff duration for all retries).
| vi.useRealTimers(); | |
| const fn = vi | |
| .fn() | |
| .mockRejectedValueOnce(new Error("CondaHTTPError: HTTP 503")) | |
| .mockRejectedValueOnce(new Error("CondaHTTPError: HTTP 503")) | |
| .mockRejectedValueOnce(new Error("CondaHTTPError: HTTP 503")); | |
| await expect( | |
| retryWithBackoff(fn, { | |
| maxRetries: 2, | |
| initialDelayMs: 10, | |
| backoffFactor: 2, | |
| }), | |
| ).rejects.toThrow("CondaHTTPError: HTTP 503"); | |
| // 1 initial + 2 retries = 3 total | |
| expect(fn).toHaveBeenCalledTimes(3); | |
| vi.useFakeTimers(); | |
| const fn = vi | |
| .fn() | |
| .mockRejectedValueOnce(new Error("CondaHTTPError: HTTP 503")) | |
| .mockRejectedValueOnce(new Error("CondaHTTPError: HTTP 503")) | |
| .mockRejectedValueOnce(new Error("CondaHTTPError: HTTP 503")); | |
| const promise = retryWithBackoff(fn, { | |
| maxRetries: 2, | |
| initialDelayMs: 10, | |
| backoffFactor: 2, | |
| }); | |
| // Advance timers for all backoff delays: 10ms, then 20ms (with factor 2) | |
| await vi.advanceTimersByTimeAsync(10); | |
| await vi.advanceTimersByTimeAsync(20); | |
| await expect(promise).rejects.toThrow("CondaHTTPError: HTTP 503"); | |
| // 1 initial + 2 retries = 3 total | |
| expect(fn).toHaveBeenCalledTimes(3); |
| const subcommand = cmd[0]; | ||
| if (constants.NETWORK_SUBCOMMANDS.includes(subcommand)) { | ||
| return await retryWithBackoff( | ||
| () => utils.execute(command, env, captureOutput), | ||
| { | ||
| maxRetries: constants.CONDA_RETRY_MAX, | ||
| initialDelayMs: constants.CONDA_RETRY_INITIAL_DELAY_MS, | ||
| }, | ||
| ); | ||
| } |
There was a problem hiding this comment.
New behavior is introduced here (network subcommands route through retry logic), but there’s no test asserting that condaCommand retries for retryable failures and does not retry for non-network subcommands. Adding a unit test that mocks utils.execute to fail with a retryable error and verifies multiple invocations (and similarly verifies only one invocation for non-network subcommands) would help prevent regressions.
Testing things!