Skip to content

Testing things: Add retry with backoff utility and add tests#471

Closed
goanpeca wants to merge 1 commit intoconda-incubator:mainfrom
goanpeca:enh/download-retries
Closed

Testing things: Add retry with backoff utility and add tests#471
goanpeca wants to merge 1 commit intoconda-incubator:mainfrom
goanpeca:enh/download-retries

Conversation

@goanpeca
Copy link
Member

@goanpeca goanpeca commented Mar 20, 2026

Testing things!

@goanpeca goanpeca self-assigned this Mar 20, 2026
@goanpeca goanpeca requested a review from a team as a code owner March 20, 2026 00:51
Copilot AI review requested due to automatic review settings March 20, 2026 00:51
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a reusable retry-with-exponential-backoff utility and wires it into network-prone conda/mamba operations (and installer downloads) to mitigate transient Anaconda connectivity failures (Fixes #129).

Changes:

  • Introduce retryWithBackoff + isRetryableError utility with configurable backoff behavior.
  • Add retry configuration/constants (retryable error patterns, retry counts/delays, network subcommands) and apply retries to conda commands and installer downloads.
  • Add Vitest coverage threshold tweaks and new unit tests for the retry utility; regenerate dist/ bundles.

Reviewed changes

Copilot reviewed 6 out of 8 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
vitest.config.ts Adjusts coverage thresholds to reflect new code/tests.
src/retry.ts Adds retry-with-backoff utility and default retryable-error matcher.
src/constants.ts Adds retry-related constants (patterns, retry defaults, network subcommands).
src/installer/base.ts Wraps tool-cache downloads with retry.
src/conda.ts Retries selected conda subcommands via retryWithBackoff.
src/tests/retry.test.ts Adds unit tests for retry behavior.
dist/setup/index.js Re-bundled action output including retry changes.
dist/delete/index.js Re-bundled action output including constants changes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +138 to +146
const subcommand = cmd[0];
if (constants.NETWORK_SUBCOMMANDS.includes(subcommand)) {
return await retryWithBackoff(
() => utils.execute(command, env, captureOutput),
{
maxRetries: constants.CONDA_RETRY_MAX,
initialDelayMs: constants.CONDA_RETRY_INITIAL_DELAY_MS,
},
);
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The retry predicate (isRetryableError) only inspects Error.message, but utils.execute() throws generic errors like " return error code " and logs the underlying CondaHTTPError/connection details to stderr via core.warning without including them in the thrown error. As a result, these retries will almost never trigger for the transient network failures this PR targets. Consider capturing stderr (or combined output) in the thrown error used for retries, or wrapping utils.execute here to surface the relevant stderr text into the error that retryWithBackoff evaluates.

Copilot uses AI. Check for mistakes.
/CondaHTTPError/i,
/ConnectionError/i,
/SSLError/i,
/HTTP [45]\d{2}/i,
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RETRYABLE_ERROR_PATTERNS currently treats all HTTP 4xx as retryable via /HTTP [45]\d{2}/i. Retrying on 4xx (e.g., 401/403 auth issues, 404 missing channel/repodata) is typically non-transient and can add 10s+ of unnecessary delay before failing. Consider narrowing this to statuses that are commonly transient (e.g., 5xx, 408, 429) and/or matching Conda's specific "HTTP 000"/connection-failed cases separately.

Suggested change
/HTTP [45]\d{2}/i,
/HTTP (5\d{2}|408|429)/i,

Copilot uses AI. Check for mistakes.
Comment on lines +54 to +79
let lastError: Error | undefined;

for (let attempt = 0; attempt <= maxRetries; attempt++) {
try {
return await fn();
} catch (err) {
lastError = err instanceof Error ? err : new Error(String(err));

const isLastAttempt = attempt === maxRetries;
if (isLastAttempt || !isRetryableFn(lastError)) {
throw lastError;
}

const delay = Math.min(
initialDelayMs * Math.pow(backoffFactor, attempt),
maxDelayMs,
);
core.warning(
`Attempt ${attempt + 1}/${maxRetries + 1} failed: ${lastError.message}`,
);
core.info(`Retrying in ${delay / 1000}s...`);
await sleep(delay);
}
}

throw lastError;
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

retryWithBackoff allows maxRetries (and other numeric options) to be negative or non-finite. If maxRetries is negative, the loop never runs and lastError remains undefined, leading to throw lastError throwing undefined (hard to debug). Consider validating/clamping maxRetries (and delays/backoffFactor) to sane non-negative finite values up front, and ensure the final throw is always an Error instance.

Suggested change
let lastError: Error | undefined;
for (let attempt = 0; attempt <= maxRetries; attempt++) {
try {
return await fn();
} catch (err) {
lastError = err instanceof Error ? err : new Error(String(err));
const isLastAttempt = attempt === maxRetries;
if (isLastAttempt || !isRetryableFn(lastError)) {
throw lastError;
}
const delay = Math.min(
initialDelayMs * Math.pow(backoffFactor, attempt),
maxDelayMs,
);
core.warning(
`Attempt ${attempt + 1}/${maxRetries + 1} failed: ${lastError.message}`,
);
core.info(`Retrying in ${delay / 1000}s...`);
await sleep(delay);
}
}
throw lastError;
const safeMaxRetries =
Number.isFinite(maxRetries) && maxRetries >= 0
? Math.floor(maxRetries)
: 0;
const safeInitialDelayMs =
Number.isFinite(initialDelayMs) && initialDelayMs >= 0
? initialDelayMs
: 10_000;
const safeBackoffFactor =
Number.isFinite(backoffFactor) && backoffFactor >= 1
? backoffFactor
: 2;
const safeMaxDelayMs =
Number.isFinite(maxDelayMs) && maxDelayMs >= 0 ? maxDelayMs : 60_000;
let lastError: Error | undefined;
for (let attempt = 0; attempt <= safeMaxRetries; attempt++) {
try {
return await fn();
} catch (err) {
lastError = err instanceof Error ? err : new Error(String(err));
const isLastAttempt = attempt === safeMaxRetries;
if (isLastAttempt || !isRetryableFn(lastError)) {
throw lastError;
}
const delay = Math.min(
safeInitialDelayMs * Math.pow(safeBackoffFactor, attempt),
safeMaxDelayMs,
);
core.warning(
`Attempt ${attempt + 1}/${safeMaxRetries + 1} failed: ${lastError.message}`,
);
core.info(`Retrying in ${delay / 1000}s...`);
await sleep(delay);
}
}
throw lastError ?? new Error("retryWithBackoff failed without a captured error");

Copilot uses AI. Check for mistakes.
Comment on lines 70 to +77
if (executablePath === "") {
const rawDownloadPath = await tc.downloadTool(options.url);
const rawDownloadPath = await retryWithBackoff(
() => tc.downloadTool(options.url),
{
maxRetries: constants.CONDA_RETRY_MAX,
initialDelayMs: constants.CONDA_RETRY_INITIAL_DELAY_MS,
},
);
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ensureLocalInstaller uses retryWithBackoff with the default isRetryableError, which is documented/implemented around Conda stderr patterns. @actions/tool-cache download failures may surface different messages (e.g., Node/TLS error codes) and may not match those patterns, making this retry ineffective or inconsistent. Consider passing a retry predicate tailored to tc.downloadTool failures (or defining a separate pattern set for generic network download errors).

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +11
import { describe, it, expect, vi, beforeEach, afterEach } from "vitest";

import { retryWithBackoff } from "../retry";

// Mock @actions/core
vi.mock("@actions/core", () => ({
info: vi.fn(),
warning: vi.fn(),
}));

describe("retryWithBackoff", () => {
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The @actions/core mock is declared after importing retryWithBackoff. In this repo’s other tests (e.g. conda.test.ts), modules are imported after mocks to ensure ESM mocks apply. To avoid the real @actions/core being loaded before the mock, move the retryWithBackoff import below vi.mock(...) or switch to await import("../retry") inside the tests.

Suggested change
import { describe, it, expect, vi, beforeEach, afterEach } from "vitest";
import { retryWithBackoff } from "../retry";
// Mock @actions/core
vi.mock("@actions/core", () => ({
info: vi.fn(),
warning: vi.fn(),
}));
describe("retryWithBackoff", () => {
import { describe, it, expect, vi, beforeEach, afterEach, beforeAll } from "vitest";
// Mock @actions/core
vi.mock("@actions/core", () => ({
info: vi.fn(),
warning: vi.fn(),
}));
let retryWithBackoff: typeof import("../retry").retryWithBackoff;
describe("retryWithBackoff", () => {
beforeAll(async () => {
({ retryWithBackoff } = await import("../retry"));
});

Copilot uses AI. Check for mistakes.
Comment on lines +84 to +103
vi.useRealTimers();

const fn = vi
.fn()
.mockRejectedValueOnce(new Error("CondaHTTPError: HTTP 503"))
.mockRejectedValueOnce(new Error("CondaHTTPError: HTTP 503"))
.mockRejectedValueOnce(new Error("CondaHTTPError: HTTP 503"));

await expect(
retryWithBackoff(fn, {
maxRetries: 2,
initialDelayMs: 10,
backoffFactor: 2,
}),
).rejects.toThrow("CondaHTTPError: HTTP 503");

// 1 initial + 2 retries = 3 total
expect(fn).toHaveBeenCalledTimes(3);

vi.useFakeTimers();
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test switches to real timers (vi.useRealTimers()), which adds real delays (initialDelay/backoff) to the test run and can introduce flakiness. Since the retry logic is timer-driven, it should be possible to keep fake timers and advanceTimersByTimeAsync to cover the same behavior deterministically (including advancing the full backoff duration for all retries).

Suggested change
vi.useRealTimers();
const fn = vi
.fn()
.mockRejectedValueOnce(new Error("CondaHTTPError: HTTP 503"))
.mockRejectedValueOnce(new Error("CondaHTTPError: HTTP 503"))
.mockRejectedValueOnce(new Error("CondaHTTPError: HTTP 503"));
await expect(
retryWithBackoff(fn, {
maxRetries: 2,
initialDelayMs: 10,
backoffFactor: 2,
}),
).rejects.toThrow("CondaHTTPError: HTTP 503");
// 1 initial + 2 retries = 3 total
expect(fn).toHaveBeenCalledTimes(3);
vi.useFakeTimers();
const fn = vi
.fn()
.mockRejectedValueOnce(new Error("CondaHTTPError: HTTP 503"))
.mockRejectedValueOnce(new Error("CondaHTTPError: HTTP 503"))
.mockRejectedValueOnce(new Error("CondaHTTPError: HTTP 503"));
const promise = retryWithBackoff(fn, {
maxRetries: 2,
initialDelayMs: 10,
backoffFactor: 2,
});
// Advance timers for all backoff delays: 10ms, then 20ms (with factor 2)
await vi.advanceTimersByTimeAsync(10);
await vi.advanceTimersByTimeAsync(20);
await expect(promise).rejects.toThrow("CondaHTTPError: HTTP 503");
// 1 initial + 2 retries = 3 total
expect(fn).toHaveBeenCalledTimes(3);

Copilot uses AI. Check for mistakes.
Comment on lines +138 to +147
const subcommand = cmd[0];
if (constants.NETWORK_SUBCOMMANDS.includes(subcommand)) {
return await retryWithBackoff(
() => utils.execute(command, env, captureOutput),
{
maxRetries: constants.CONDA_RETRY_MAX,
initialDelayMs: constants.CONDA_RETRY_INITIAL_DELAY_MS,
},
);
}
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New behavior is introduced here (network subcommands route through retry logic), but there’s no test asserting that condaCommand retries for retryable failures and does not retry for non-network subcommands. Adding a unit test that mocks utils.execute to fail with a retryable error and verifies multiple invocations (and similarly verifies only one invocation for non-network subcommands) would help prevent regressions.

Copilot uses AI. Check for mistakes.
@goanpeca goanpeca closed this Mar 20, 2026
@goanpeca goanpeca changed the title Add retry with backoff utility and add tests Testing things: Add retry with backoff utility and add tests Mar 20, 2026
@goanpeca goanpeca deleted the enh/download-retries branch March 20, 2026 01:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants