diff --git a/.changeset/readiness-checks.md b/.changeset/readiness-checks.md new file mode 100644 index 0000000..c25c8bb --- /dev/null +++ b/.changeset/readiness-checks.md @@ -0,0 +1,7 @@ +--- +'@cloudflare/containers': minor +--- + +Add user-defined readiness checks via a `readyOn` class attribute and `addReadinessCheck` / `setReadinessChecks` instance methods. Ships with `portResponding(port, { pingEndpoint? })` and `isHealthy(path, { port?, pingEndpoint? })` helpers; arbitrary async checks are supported via `(container) => Promise`. + +`portResponding` checks for `defaultPort` and every entry in `requiredPorts` are added automatically, so you don't need to list them explicitly when declaring `readyOn`. `addReadinessCheck` preserves the auto port checks; `setReadinessChecks` replaces everything (include port checks explicitly if you need them). diff --git a/AGENTS.md b/AGENTS.md index be49e31..d65be02 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -63,8 +63,54 @@ Build output goes to `dist/`. Do not edit files in `dist/`. **Starting a container:** - `start(startOptions?, waitOptions?)` — starts without waiting for ports -- `startAndWaitForPorts(args)` — starts and polls until ports are ready +- `startAndWaitForPorts(args)` — starts and runs all readiness checks (defaults to port checks) - `waitForPort(waitOptions)` — polls a single port; returns tries used +- `waitForPath(waitOptions & { path })` — polls an HTTP path until it returns 2xx + +**Readiness checks:** + +Readiness checks gate fetch proxying — every check must resolve before requests flow to the container. All checks run in parallel, so ordering doesn't matter. + +Checks should retry internally on transient "not ready yet" conditions rather than rejecting. A rejection is terminal and causes the parent `fetch` to return a 500. Loop cooperatively against `options.signal` (which fires on timeout) and only reject when something is genuinely broken. + +`portResponding` checks for `defaultPort` and every entry in `requiredPorts` are added automatically, so you don't need to list them explicitly: + +```ts +import { Container, isHealthy } from '@cloudflare/containers'; + +class MyApp extends Container { + defaultPort = 8080; + // portResponding(8080) is added automatically + readyOn = [isHealthy('/health')]; +} +``` + +Add checks at runtime with `addReadinessCheck` — auto port checks are still applied: + +```ts +// Effective: [portResponding(8080), isHealthy('/ready')] +container.addReadinessCheck(isHealthy('/ready')); + +container.addReadinessCheck(async () => { + await warmCachesFromR2(); +}); +``` + +`setReadinessChecks` takes full control: auto port checks are NOT added, so include them explicitly if you need them. Pass `[]` to opt out entirely. + +```ts +import { portResponding, isHealthy } from '@cloudflare/containers'; + +// Replace everything — include port checks explicitly +container.setReadinessChecks([ + portResponding(8080), + isHealthy('/ready'), + async () => { await migrateDatabase(); }, +]); + +// Opt out — ready as soon as the process starts +container.setReadinessChecks([]); +``` **HTTP methods:** diff --git a/README.md b/README.md index c220388..65d3095 100644 --- a/README.md +++ b/README.md @@ -10,6 +10,7 @@ A class for interacting with Containers on Cloudflare Workers. - Simple container lifecycle management (starting and stopping containers) - Event hooks for container lifecycle events (onStart, onStop, onError) - Configurable sleep timeout that renews on requests +- Readiness checks that gate request proxying until the app is ready - Load balancing utilities ## Installation @@ -68,6 +69,12 @@ The `Container` class that extends a container-enbled Durable Object to provide Array of ports that should be checked for availability during container startup. Used by `startAndWaitForPorts` when no specific ports are provided. +- `readyOn?: ReadinessCheck[]` + + Readiness checks that must all resolve before fetch requests are proxied to the container. See [Readiness Checks](#readiness-checks) for details. + + `portResponding` checks for `defaultPort` and every entry in `requiredPorts` are added automatically — you don't need to list them explicitly. + - `sleepAfter` How long to keep the container alive without activity (format: number for seconds, or string like "5m", "30s", "1h"). @@ -262,6 +269,22 @@ See [this example](#http-example-with-lifecycle-hooks). Manually renews the container activity timeout (extends container lifetime). +- `addReadinessCheck(check: ReadinessCheck)` + + Appends a readiness check at runtime. Auto `portResponding` checks for `defaultPort` / `requiredPorts` are preserved. + +- `setReadinessChecks(checks: ReadinessCheck[])` + + Replaces the readiness check list. Auto port checks are **not** added — include them explicitly if you need them. Pass `[]` to opt out entirely. + +- `waitForPort(options: WaitOptions): Promise` + + Polls a TCP port until it accepts an HTTP connection. Used by `portResponding`. + +- `waitForPath(options: WaitOptions & { path: string; pingEndpoint?: string }): Promise` + + Polls an HTTP path until it returns a 2xx response. Used by `isHealthy`. The Host header defaults to the host portion of the container's `pingEndpoint`; pass `pingEndpoint` in the options to override for this call only. + ##### Outbound Interception Use outbound interception when you want to control what the container can reach, or proxy outbound requests through Worker code. @@ -356,6 +379,14 @@ Processing order (first match wins): If no name is provided, "cf-singleton-container" is used. +- `portResponding(port: number, options?: { pingEndpoint?: string }): ReadinessCheck` + + Readiness check factory that waits for the given port to start accepting HTTP connections. Pass `pingEndpoint` to override the probe endpoint for this check only. See [Readiness Checks](#readiness-checks). + +- `isHealthy(path: string, options?: { port?: number; pingEndpoint?: string }): ReadinessCheck` + + Readiness check factory that polls an HTTP path until it returns 2xx. Falls back to `defaultPort` when `port` is omitted; falls back to the host portion of the container's `pingEndpoint` when `pingEndpoint` is omitted. See [Readiness Checks](#readiness-checks). + ## Examples ### HTTP Example with Lifecycle Hooks @@ -408,6 +439,113 @@ export class MyContainer extends Container { } ``` +### Readiness Checks + +Readiness checks gate fetch proxying — every check must resolve before requests flow to the container. Use them to wait on health endpoints, warmup work, migrations, or anything else that has to finish before traffic is served. + +Checks run in parallel, so ordering doesn't matter. If any check rejects, the container is not considered ready and `fetch` / `containerFetch` returns a 500. + +**Auto port checks.** A `portResponding` check is added automatically for `defaultPort` and every entry in `requiredPorts`, so you don't need to list them in `readyOn`: + +```typescript +import { Container, isHealthy } from '@cloudflare/containers'; + +export class MyContainer extends Container { + defaultPort = 8080; + + // portResponding(8080) is added automatically. + // Effective checks: [portResponding(8080), isHealthy('/health')] + readyOn = [isHealthy('/health')]; +} +``` + +**Custom readiness checks.** A readiness check is a function with this signature: + +```typescript +type ReadinessCheck = ( + container: Container, + options?: { signal?: AbortSignal } +) => Promise; +``` + +The check receives two arguments: + +- **`container`** — the `Container` instance itself. Use it to call `waitForPath`, `waitForPort`, read instance config like `defaultPort`, or access `container.env` for bindings. +- **`options.signal`** — an `AbortSignal` that fires if the caller aborts or the readiness timeout elapses. Long-running checks should honour it (pass it to `fetch`, `waitForPath`, etc.) so they cancel cleanly; checks that ignore it may keep running in the background after a timeout. + +> **Don't reject on "not ready yet" — retry inside the check.** A rejection is terminal: the whole readiness gate rejects and the parent `fetch` / `containerFetch` returns a 500. If the condition you're waiting on is transient (upstream not up yet, file not written yet, etc.), loop with a short sleep until it's true or `options.signal` fires. The signal fires when the overall readiness timeout elapses, so cooperative loops are bounded. Only reject when something is genuinely broken or the signal aborted. + +Custom checks typically do one of three things: wait on an external dependency, run inline warmup, or poll the container's own HTTP surface (use `container.waitForPath` or `container.waitForPort` rather than `containerFetch`, which itself waits for readiness and would recurse). + +```typescript +import { Container, isHealthy, type ReadinessCheck } from '@cloudflare/containers'; + +// Example 1: wait for an external dependency. The check loops until the +// upstream returns 2xx OR the signal aborts — it does not reject on a +// transient bad response. +const upstreamReady: ReadinessCheck = async (_container, { signal } = {}) => { + while (!signal?.aborted) { + try { + const response = await fetch('https://api.example.com/ping', { signal }); + if (response.ok) return; + } catch { + // connection error — try again + } + await new Promise(resolve => setTimeout(resolve, 500)); + } + throw new Error('upstream did not become ready before readiness timed out'); +}; + +// Example 2: poll a container endpoint. `container.waitForPath` already +// retries internally until 2xx or the retry budget is exhausted, so this +// check doesn't need its own loop. It rejects only if the endpoint never +// becomes healthy within the budget — which is the correct behaviour. +const modelsLoaded: ReadinessCheck = async (container, { signal } = {}) => { + await container.waitForPath({ path: '/models', portToCheck: 8080, signal }); +}; + +// Example 3: inline warmup. This is one-shot work (not a condition to +// poll) so rejecting on genuine failure is fine — the container really +// isn't ready if warmup failed. +const warmCaches: ReadinessCheck = async () => { + await warmCachesFromR2(); +}; + +export class InferenceContainer extends Container { + defaultPort = 8080; + + readyOn = [isHealthy('/health'), upstreamReady, modelsLoaded, warmCaches]; +} +``` + +**Adding checks at runtime.** `addReadinessCheck` appends to the list. Auto port checks are preserved. + +```typescript +// defaultPort = 8080, no `readyOn` declared on the class. +// Effective after this call: [portResponding(8080), isHealthy('/ready')] +container.addReadinessCheck(isHealthy('/ready')); + +container.addReadinessCheck(async () => { + await seedDatabase(); +}); +``` + +**Replacing the list.** `setReadinessChecks` takes full control — auto port checks are **not** added, so include them explicitly if you want them. Pass `[]` to opt out of readiness checking entirely. + +```typescript +import { portResponding, isHealthy } from '@cloudflare/containers'; + +// Replace everything. Port checks are NOT auto-added. +container.setReadinessChecks([ + portResponding(8080), + isHealthy('/ready'), + async () => { await migrateDatabase(); }, +]); + +// Opt out — ready as soon as the process starts. +container.setReadinessChecks([]); +``` + ### WebSocket Support The Container class automatically supports proxying WebSocket connections to your container. WebSocket connections are bi-directionally proxied, with messages forwarded in both directions. The Container also automatically renews the activity timeout when WebSocket messages are sent or received. diff --git a/examples/core-tests/container_src/server.js b/examples/core-tests/container_src/server.js index f52b8d2..9e912e0 100644 --- a/examples/core-tests/container_src/server.js +++ b/examples/core-tests/container_src/server.js @@ -1,5 +1,11 @@ import { createServer } from 'http'; +// Time the health endpoint starts returning 2xx. Lets tests exercise +// readiness checks that need to wait for app-level warmup — not just the +// port binding. +const HEALTHY_AFTER_MS = Number(process.env.HEALTHY_AFTER_MS ?? '0'); +const startupTime = Date.now(); + const server = createServer(function (req, res) { if (req.url?.startsWith('/containerFetchNoContent')) { res.writeHead(204); @@ -7,6 +13,13 @@ const server = createServer(function (req, res) { return; } + if (req.url === '/health') { + const ready = Date.now() - startupTime >= HEALTHY_AFTER_MS; + res.writeHead(ready ? 200 : 503, { 'Content-Type': 'text/plain' }); + res.end(ready ? 'ok' : 'warming up'); + return; + } + if (req.url === '/error') { res.writeHead(500, { 'Content-Type': 'text/plain' }); res.end('Internal server error'); diff --git a/examples/core-tests/src/index.ts b/examples/core-tests/src/index.ts index ad44f36..8fca45c 100644 --- a/examples/core-tests/src/index.ts +++ b/examples/core-tests/src/index.ts @@ -1,4 +1,4 @@ -import { Container } from '../../../src/lib/container'; +import { Container, isHealthy } from '../../../src/lib/container'; import { getContainer, switchPort } from '../../../src/lib/utils'; /** @@ -34,10 +34,40 @@ export class TestContainer extends Container { } } +/** + * Container with a `readyOn` list to exercise readiness checks. The + * container's /health endpoint only starts returning 2xx after + * HEALTHY_AFTER_MS has elapsed, so `isHealthy` must poll. + * + * Note: `portResponding(8080)` is added automatically because + * `defaultPort = 8080` — we only declare the path check here. + */ +export class ReadyOnContainer extends Container { + defaultPort = 8080; + sleepAfter = '3m'; + readyOn = [isHealthy('/health')]; + + constructor(ctx: any, env: any) { + super(ctx, env); + this.envVars = { + MESSAGE: 'ready on container', + HEALTHY_AFTER_MS: '1500', + }; + this.entrypoint = ['node', 'server.js']; + } + + override async onStart(): Promise { + console.log('readyOn onStart hook called'); + } +} + export default { async fetch( request: Request, - env: { CONTAINER: DurableObjectNamespace } + env: { + CONTAINER: DurableObjectNamespace; + READY_ON_CONTAINER: DurableObjectNamespace; + } ): Promise { const url = new URL(request.url); // get a new container instance per request @@ -84,6 +114,20 @@ export default { await container.stop(); return new Response('Container stopping'); } + + if (url.pathname === '/readyOn/fetch') { + const readyOn = getContainer(env.READY_ON_CONTAINER, id); + console.log('Handling readyOn http fetch request'); + const response = await readyOn.fetch(request); + const body = await response.text(); + return new Response( + JSON.stringify({ + status: response.status, + body, + }) + ); + } + return new Response('Not Found'); }, }; diff --git a/examples/core-tests/test/index.test.ts b/examples/core-tests/test/index.test.ts index e2f55a9..12d673c 100644 --- a/examples/core-tests/test/index.test.ts +++ b/examples/core-tests/test/index.test.ts @@ -131,6 +131,40 @@ describe('core functionality', () => { expect(fetchRequestIndex).toBeLessThan(secondOnStartIndex); }); + test('readyOn waits for pathHealthy before proxying', async () => { + const runner = new WranglerDevRunner(); + + const url = await runner.getUrl(); + const id = randomUUID(); + + try { + const response = await vi.waitFor( + async () => { + const res = await fetch(`${url}/readyOn/fetch?id=${id}`); + if (res.status !== 200) { + console.log(await res.text()); + throw new Error(`Expected status 200, got ${res.status}`); + } + return res; + }, + { timeout: 15000 } + ); + + const payload = (await response.json()) as { status: number; body: string }; + // If the readiness check passed, the container responded with its + // normal message — not the 503 "warming up" response from /health. + expect(payload.status).toBe(200); + expect(payload.body).toBe( + 'Hello from test container! process.env.MESSAGE: ready on container' + ); + } finally { + await runner.destroy([id]); + } + + const output = runner.getStdout(); + expect(output).toMatch(/readyOn onStart hook called/); + }); + test('stop', async () => { const runner = new WranglerDevRunner(); diff --git a/examples/core-tests/wrangler.jsonc b/examples/core-tests/wrangler.jsonc index 96959bb..cd88bf5 100644 --- a/examples/core-tests/wrangler.jsonc +++ b/examples/core-tests/wrangler.jsonc @@ -12,6 +12,12 @@ "name": "test-container", "max_instances": 2, }, + { + "image": "./Dockerfile", + "class_name": "ReadyOnContainer", + "name": "ready-on-container", + "max_instances": 2, + }, ], "durable_objects": { "bindings": [ @@ -19,6 +25,10 @@ "class_name": "TestContainer", "name": "CONTAINER", }, + { + "class_name": "ReadyOnContainer", + "name": "READY_ON_CONTAINER", + }, ], }, "migrations": [ @@ -26,5 +36,9 @@ "tag": "v1", "new_sqlite_classes": ["TestContainer"], }, + { + "tag": "v2", + "new_sqlite_classes": ["ReadyOnContainer"], + }, ], } diff --git a/package-lock.json b/package-lock.json index 8a8e481..32b7b9d 100644 --- a/package-lock.json +++ b/package-lock.json @@ -1,12 +1,12 @@ { "name": "@cloudflare/containers", - "version": "0.3.2", + "version": "0.3.3", "lockfileVersion": 2, "requires": true, "packages": { "": { "name": "@cloudflare/containers", - "version": "0.3.2", + "version": "0.3.3", "license": "ISC", "devDependencies": { "@changesets/cli": "^2.29.6", diff --git a/src/index.ts b/src/index.ts index 9042ef7..3fc7b04 100644 --- a/src/index.ts +++ b/src/index.ts @@ -1,10 +1,19 @@ -export { Container, ContainerProxy, outboundParams } from './lib/container'; +export { + Container, + ContainerProxy, + outboundParams, + isHealthy, + portResponding, +} from './lib/container'; export type { OutboundHandler, OutboundHandlerContext, OutboundHandlerParams, OutboundHandlerParamsOf, OutboundHandlers, + ReadinessCheck, + ReadinessCheckFactoryOptions, + ReadinessCheckOptions, } from './lib/container'; export { getRandom, loadBalance, getContainer, switchPort } from './lib/utils'; export type { diff --git a/src/lib/container.ts b/src/lib/container.ts index a29adb4..fdeaf72 100644 --- a/src/lib/container.ts +++ b/src/lib/container.ts @@ -126,6 +126,119 @@ const signalToNumbers: Record = { SIGKILL: 9, }; +/** + * Options passed into a ReadinessCheck when it's invoked. + */ +export interface ReadinessCheckOptions { + /** Optional abort signal to cancel the check */ + signal?: AbortSignal; +} + +/** + * A readiness check is a function that returns a promise. + * The Container will wait for every declared readiness check to resolve + * before allowing fetch requests to be proxied to the container. + * + * Checks receive the container instance (so helpers like `portResponding` + * can poll the right TCP port) and an options bag with an optional abort + * signal so long-running checks can cooperatively abort. + * + * **Do not reject on "not ready yet" — retry instead.** Rejection is + * treated as a terminal failure: the whole readiness gate rejects and + * the parent `fetch` / `containerFetch` returns a 500. If the condition + * you're checking is transiently false (e.g. upstream isn't up, file + * isn't written yet), loop inside the check with a small sleep until + * either the condition is met or `options.signal` fires. The signal + * fires when the overall readiness timeout (`portReadyTimeoutMS`) + * elapses, so looping cooperatively is bounded. Only reject when + * something is genuinely broken or the signal aborted. + */ +export type ReadinessCheck = ( + // Using a minimal structural type here avoids generic friction when a + // `Container` subclass passes `this` to a check expecting the + // unparameterized Container. + container: Container, + options?: ReadinessCheckOptions +) => Promise; + +/** + * Options accepted by the `portResponding` and `isHealthy` readiness + * check factories. + */ +export interface ReadinessCheckFactoryOptions { + /** + * Port to check. Defaults to the container's `defaultPort` when + * omitted. If neither is set, the check rejects. + */ + port?: number; + /** + * Override the Host header used for the request. Defaults to the host + * portion of the container's `pingEndpoint` (e.g. `container` when + * `pingEndpoint = 'container/health'`). + */ + pingEndpoint?: string; +} + +/** + * Readiness check that waits for the given port to start accepting HTTP + * connections. Any HTTP response (including 4xx) counts as "responding" — + * the goal is to confirm the process has bound the port. + * + * @example + * class MyApp extends Container { + * readyOn = [portResponding(8080)]; + * } + * + * @example + * // Override the Host header used by the probe: + * portResponding(8080, { pingEndpoint: 'container/ping' }); + */ +export function portResponding( + port: number, + options: { pingEndpoint?: string } = {} +): ReadinessCheck { + return (container, runOptions) => + container.waitForPort({ + portToCheck: port, + pingEndpoint: options.pingEndpoint, + signal: runOptions?.signal, + }); +} + +/** + * Readiness check that polls an HTTP path until it returns a 2xx response. + * Useful for apps that expose a `/health` or `/ready` endpoint. + * + * @example + * class MyApp extends Container { + * defaultPort = 8080; + * readyOn = [isHealthy('/health')]; + * } + * + * @example + * // Target a specific port and override the Host header: + * isHealthy('/health', { port: 8081, pingEndpoint: 'container' }); + */ +export function isHealthy( + path: string, + options: ReadinessCheckFactoryOptions = {} +): ReadinessCheck { + return (container, runOptions) => { + const targetPort = options.port ?? container.defaultPort; + if (targetPort === undefined) { + return Promise.reject( + new Error(`isHealthy('${path}'): no port specified and no defaultPort set on the container`) + ); + } + return container.waitForPath({ + path, + portToCheck: targetPort, + pingEndpoint: options.pingEndpoint, + signal: runOptions?.signal, + }); + }; +} + // ===================== // ===================== // HELPER FUNCTIONS @@ -175,6 +288,16 @@ function getExitCodeFromError(error: unknown): number | null { return null; } +/** + * Split a `pingEndpoint` value (e.g. `'container/health'`) into its host + * portion. Everything before the first `/` is the host; if no `/` is + * present, the whole string is the host. + */ +function parsePingEndpointHost(pingEndpoint: string): string { + const slashIndex = pingEndpoint.indexOf('/'); + return slashIndex === -1 ? pingEndpoint : pingEndpoint.slice(0, slashIndex); +} + /** * Combines the existing user-defined signal with a signal that aborts after the timeout specified by waitInterval */ @@ -508,6 +631,39 @@ export class Container extends DurableObject { // but it's still useful if you want to control the path that // the Container class uses to send HTTP requests to. pingEndpoint: string = 'ping'; + + /** + * Readiness checks that must all resolve before fetch requests are + * allowed through to the container. + * + * `portResponding` checks for `defaultPort` and every `requiredPorts` + * entry are added automatically — you don't need to list them here. + * + * @example + * class MyApp extends Container { + * defaultPort = 8080; + * // portResponding(8080) is added automatically + * readyOn = [isHealthy('/health')]; + * } + * + * @example + * class MyApp extends Container { + * requiredPorts = [8080, 8081]; + * // portResponding(8080) and portResponding(8081) are added automatically + * readyOn = [isHealthy('/health', { port: 8080 })]; + * } + * + * Use `addReadinessCheck(...)` to add a check at runtime, or + * `setReadinessChecks([...])` to take full control (auto port checks + * are NOT added when `setReadinessChecks` is used — include them + * explicitly if you need them). Pass `setReadinessChecks([])` to opt + * out entirely. + * + * Checks run in parallel, so ordering does not matter. If any check + * rejects, the container is not considered ready. + */ + readyOn?: ReadinessCheck[]; + applyOutboundInterceptionPromise: Promise = Promise.resolve(); usingInterception = false; @@ -796,15 +952,23 @@ export class Container extends DurableObject { } /** - * Start the container and wait for ports to be available. + * Start the container and wait for it to become ready. + * + * Readiness is determined by the container's `readyOn` list. If + * `readyOn` is undefined, a default list derived from `defaultPort` / + * `requiredPorts` is used — equivalent to the historical "wait for + * ports" behaviour. * - * For each specified port, it polls until the port is available or `cancellationOptions.portReadyTimeoutMS` is reached. + * All readiness checks run in parallel. The method resolves once every + * check resolves, or rejects on the first failure. * - * @param ports - The ports to wait for (if undefined, uses requiredPorts or defaultPort) - * @param cancellationOptions - Options to configure timeouts, polling intereva, and abort signal - * @param startOptions Override configuration on a per-instance basis for env vars, entrypoint command, internet access, and labels - * @returns A promise that resolves when the container has been started and the ports are listening - * @throws Error if port checks fail after the specified timeout or if the container fails to start. + * @param ports - If provided, overrides `readyOn` and waits for just + * these ports (useful for ad-hoc ports not declared on the class) + * @param cancellationOptions - Timeouts, polling interval, and abort + * @param startOptions - Override env vars, entrypoint, internet access, and labels on a per-instance basis + * @returns Resolves when the container is running and ready + * @throws If the container fails to start, any readiness check fails, + * or the timeout is exceeded */ public async startAndWaitForPorts(args: StartAndWaitForPortsOptions): Promise; public async startAndWaitForPorts( @@ -838,54 +1002,113 @@ export class Container extends DurableObject { resolvedStartOptions = startOptions; } - // Determine which ports to check - const portsToCheck = await this.getPortsToCheck(ports); - // trigger all onStop that we didn't do yet await this.syncPendingStoppedEvents(); - // Prepare to start the container resolvedCancellationOptions ??= {}; const containerGetTimeout = resolvedCancellationOptions.instanceGetTimeoutMS ?? TIMEOUT_TO_GET_CONTAINER_MS; const pollInterval = resolvedCancellationOptions.waitInterval ?? INSTANCE_POLL_INTERVAL_MS; - let containerGetRetries = Math.ceil(containerGetTimeout / pollInterval); + const containerGetRetries = Math.ceil(containerGetTimeout / pollInterval); + + // Explicit ports override the configured readiness checks; otherwise + // use `readyOn` (or the default derived from defaultPort/requiredPorts). + const readinessChecks = + ports !== undefined + ? (Array.isArray(ports) ? ports : [ports]).map(p => portResponding(p)) + : this.getReadinessChecks(); + + // The initial port probe (during startContainerIfNotRunning) needs a + // concrete port — use an explicit one, the first required port, the + // default port, or a fallback. This is just to verify the container + // process is reachable; readiness checks run after. + const probePort = this.getProbePort(ports); const waitOptions: WaitOptions = { signal: resolvedCancellationOptions.abort, retries: containerGetRetries, waitInterval: pollInterval, - portToCheck: portsToCheck[0], + portToCheck: probePort, }; - // Start the container if it's not running - const triesUsed = await this.startContainerIfNotRunning(waitOptions, resolvedStartOptions); - - // Check each port - - const totalPortReadyTries = Math.ceil( - (resolvedCancellationOptions.portReadyTimeoutMS ?? TIMEOUT_TO_GET_PORTS_MS) / pollInterval - ); - let triesLeft = totalPortReadyTries - triesUsed; - - for (const port of portsToCheck) { - triesLeft = await this.waitForPort({ - signal: resolvedCancellationOptions.abort, - waitInterval: pollInterval, - retries: triesLeft, - portToCheck: port, - }); - } + // Start the container if it's not running. The return value is the + // number of poll iterations used during start. + const startTriesUsed = await this.startContainerIfNotRunning(waitOptions, resolvedStartOptions); + + // Readiness shares the `portReadyTimeoutMS` budget with startup — + // time spent waiting for the container to come up is subtracted from + // the remaining budget. If start consumed the whole budget, readiness + // gets 0ms and will reject immediately. + const portReadyTimeoutMs = + resolvedCancellationOptions.portReadyTimeoutMS ?? TIMEOUT_TO_GET_PORTS_MS; + const remainingReadyBudgetMs = Math.max(0, portReadyTimeoutMs - startTriesUsed * pollInterval); + await this.runReadinessChecks(readinessChecks, { + signal: resolvedCancellationOptions.abort, + timeoutMs: remainingReadyBudgetMs, + }); this.setupMonitorCallbacks(); await this.ctx.blockConcurrencyWhile(async () => { - // All ports are ready + // All readiness checks passed await this.state.setHealthy(); await this.onStart(); }); } + /** + * Append a readiness check to `readyOn`. + * + * Automatic `portResponding` checks for `defaultPort` / `requiredPorts` + * are preserved — this adds to them rather than replacing. Use + * `setReadinessChecks` if you need full control. + * + * @example + * // defaultPort = 8080, no readyOn declared on the class. + * // Effective checks after this call: + * // [portResponding(8080), isHealthy('/ready')] + * container.addReadinessCheck(isHealthy('/ready')); + * + * @example + * // Add a one-off warmup check: + * container.addReadinessCheck(async () => { + * await warmCachesFromR2(); + * }); + */ + public addReadinessCheck(check: ReadinessCheck): void { + if (this.readyOn === undefined) { + this.readyOn = []; + } + this.readyOn.push(check); + } + + /** + * Replace the readiness check list with the provided one. + * + * Unlike `readyOn` and `addReadinessCheck`, this takes full control: + * automatic `portResponding` checks for `defaultPort` / `requiredPorts` + * are NOT added. If you want port checks, include them explicitly. + * + * Pass an empty array to opt out of readiness checking entirely — the + * container is considered ready as soon as it starts. + * + * @example + * // Replace everything, including any auto port checks: + * container.setReadinessChecks([ + * portResponding(8080), + * isHealthy('/ready'), + * async () => { await migrateDatabase(); }, + * ]); + * + * @example + * // Opt out — ready immediately once the process starts: + * container.setReadinessChecks([]); + */ + public setReadinessChecks(checks: ReadinessCheck[]): void { + this.readyOn = [...checks]; + this.readinessChecksReplaced = true; + } + /** * * Waits for a specified port to be ready @@ -897,57 +1120,117 @@ export class Container extends DurableObject { * - `abort`: Optional AbortSignal to cancel waiting * - `retries`: Number of retries before giving up (default: TRIES_TO_GET_PORTS) * - `waitInterval`: Interval between retries in milliseconds (default: INSTANCE_POLL_INTERVAL_MS) + * - `pingEndpoint`: Override the endpoint used for the probe request. Defaults to `this.pingEndpoint`. */ - public async waitForPort(waitOptions: WaitOptions): Promise { - const port = waitOptions.portToCheck; - const tcpPort = this.container.getTcpPort(port); + public async waitForPort(waitOptions: WaitOptions & { pingEndpoint?: string }): Promise { + const endpoint = waitOptions.pingEndpoint ?? this.pingEndpoint; + return this.pollUntilReady(waitOptions, `Port ${waitOptions.portToCheck}`, async signal => { + // Any HTTP response counts as "port is listening" — we don't + // care about status or body. Using the full pingEndpoint as the + // URL host preserves the documented "container/health" format. + await this.container + .getTcpPort(waitOptions.portToCheck) + .fetch(`http://${endpoint}`, { signal }); + }); + } + + /** + * Polls an HTTP path on the container until it returns a 2xx response, + * or the retry budget is exhausted. + * + * The Host header defaults to the host portion of `this.pingEndpoint` + * (e.g. `container` for the default `'ping'`-style config, or + * whatever's set when `pingEndpoint` is `'container/health'`). Pass + * `pingEndpoint` in the options to override for this call only. + * + * Returns the number of tries used, or throws if the path never + * returned a healthy response. + */ + public async waitForPath( + waitOptions: WaitOptions & { path: string; pingEndpoint?: string } + ): Promise { + const { portToCheck: port, path, pingEndpoint } = waitOptions; + const host = parsePingEndpointHost(pingEndpoint ?? this.pingEndpoint); + const normalizedPath = path.startsWith('/') ? path : `/${path}`; + + return this.pollUntilReady( + waitOptions, + `Path ${normalizedPath} on port ${port}`, + async signal => { + const response = await this.container + .getTcpPort(port) + .fetch(`http://${host}${normalizedPath}`, { signal }); + + // Free the response body regardless of status so we don't leak + // it during polling. + try { + await response.body?.cancel(); + } catch {} + + if (!response.ok) { + throw new Error(`status ${response.status}`); + } + } + ); + } + + /** + * Shared polling loop used by `waitForPort` and `waitForPath`. + * + * Calls `probe` once per iteration with a signal that aborts if the + * outer caller aborts or if the per-attempt `PING_TIMEOUT_MS` ticks + * first. Between iterations we sleep for `waitInterval` or exit early + * if the outer signal aborts. + * + * Returns the configured retry count on success (preserving the + * existing `waitForPort` contract). Throws the last error if the + * container has exited or the retry budget is exhausted. + */ + private async pollUntilReady( + waitOptions: WaitOptions, + label: string, + probe: (signal: AbortSignal) => Promise + ): Promise { + const pollInterval = waitOptions.waitInterval ?? INSTANCE_POLL_INTERVAL_MS; + const tries = waitOptions.retries ?? Math.ceil(TIMEOUT_TO_GET_PORTS_MS / pollInterval); const abortedSignal = new Promise(res => { waitOptions.signal?.addEventListener('abort', () => { res(true); }); }); - const pollInterval = waitOptions.waitInterval ?? INSTANCE_POLL_INTERVAL_MS; - let tries = waitOptions.retries ?? Math.ceil(TIMEOUT_TO_GET_PORTS_MS / pollInterval); - // Try to connect to the port multiple times for (let i = 0; i < tries; i++) { try { const combinedSignal = addTimeoutSignal(waitOptions.signal, PING_TIMEOUT_MS); - await tcpPort.fetch(`http://${this.pingEndpoint}`, { signal: combinedSignal }); - - // Successfully connected to this port - console.log(`Port ${port} is ready`); - break; + await probe(combinedSignal); + console.log(`${label} is ready`); + return tries; } catch (e) { - // Check for specific error messages that indicate we should keep retrying const errorMessage = e instanceof Error ? e.message : String(e); + console.debug(`Error checking ${label}: ${errorMessage}`); - console.debug(`Error checking ${port}: ${errorMessage}`); - - // If not running, it means the container crashed + // If the container process died, don't keep polling — something + // went wrong during startup. if (!this.container.running) { try { await this.onError( new Error( - `Container crashed while checking for ports, did you start the container and setup the entrypoint correctly?` + `Container crashed while checking ${label}, did you start the container and setup the entrypoint correctly?` ) ); } catch {} - throw e; } - // If we're on the last attempt and the port is still not ready, fail if (i === tries - 1) { try { await this.onError( - `Failed to verify port ${port} is available after ${(i + 1) * pollInterval}ms, last error: ${errorMessage}` + `Failed to verify ${label} is ready after ${(i + 1) * pollInterval}ms, last error: ${errorMessage}` ); } catch {} throw e; } - // Wait a bit before trying again await Promise.any([ new Promise(resolve => setTimeout(resolve, pollInterval)), abortedSignal, @@ -1344,6 +1627,11 @@ export class Container extends DurableObject { private outboundByHostOverrides: OutboundByHostOverrides = {}; private outboundHandlerOverride?: OutboundHandlerOverride; + // Set to true once `setReadinessChecks` has been called. Signals that + // the user wants full control — auto `portResponding` checks for + // `defaultPort` / `requiredPorts` are no longer added on top. + private readinessChecksReplaced = false; + // Only set when the user calls setAllowedHosts/setDeniedHosts at runtime private allowedHostsOverride?: string[]; private deniedHostsOverride?: string[]; @@ -1646,28 +1934,110 @@ export class Container extends DurableObject { } /** + * Returns a port to use for the initial "is the container reachable" + * probe in `startContainerIfNotRunning`. This is distinct from the + * readiness check list — it only confirms the container process is + * listening somewhere so we can set `state = running`. Readiness checks + * run after this probe succeeds. + */ + private getProbePort(overridePorts?: number | number[]): number { + if (overridePorts !== undefined) { + return Array.isArray(overridePorts) ? overridePorts[0] : overridePorts; + } + + if (this.requiredPorts && this.requiredPorts.length > 0) { + return this.requiredPorts[0]; + } + + return this.defaultPort ?? FALLBACK_PORT_TO_CHECK; + } + + /** + * Build the `portResponding` checks implied by `defaultPort` / + * `requiredPorts`. These are automatically merged into the effective + * readiness list unless the user has explicitly called + * `setReadinessChecks`. * - * The method prioritizes port sources in this order: - * 1. Ports specified directly in the method call - * 2. `requiredPorts` class property (if set) - * 3. `defaultPort` (if neither of the above is specified) - * 4. Falls back to port 33 if none of the above are set + * `requiredPorts` takes precedence over `defaultPort` to match the + * existing probe semantics. */ - private async getPortsToCheck(overridePorts?: number | number[]) { - let portsToCheck: number[] = []; + private getAutoPortChecks(): ReadinessCheck[] { + if (this.requiredPorts && this.requiredPorts.length > 0) { + return this.requiredPorts.map(port => portResponding(port)); + } + if (this.defaultPort !== undefined) { + return [portResponding(this.defaultPort)]; + } + return []; + } - if (overridePorts !== undefined) { - // Use explicitly provided ports (single port or array) - portsToCheck = Array.isArray(overridePorts) ? overridePorts : [overridePorts]; - } else if (this.requiredPorts && this.requiredPorts.length > 0) { - // Use requiredPorts class property if available - portsToCheck = [...this.requiredPorts]; - } else { - // Fall back to defaultPort if available - portsToCheck = [this.defaultPort ?? FALLBACK_PORT_TO_CHECK]; + /** + * Resolve the active readiness check list. + * + * - If `setReadinessChecks` has been called, the user's list is + * returned as-is (full override, no auto port checks). + * - Otherwise, the effective list is `[...autoPortChecks, ...readyOn]` + * so port checks from `defaultPort` / `requiredPorts` are always + * included alongside user-declared checks. + */ + private getReadinessChecks(): ReadinessCheck[] { + if (this.readinessChecksReplaced) { + return this.readyOn ?? []; } + return [...this.getAutoPortChecks(), ...(this.readyOn ?? [])]; + } - return portsToCheck; + /** + * Run every readiness check in parallel and resolve when they all + * succeed. Rejects on the first failure or when the timeout fires. + * + * There are two complementary cancellation mechanisms: + * + * 1. **Cooperative abort**: every check receives an `AbortSignal` that + * aborts when the caller aborts or when `timeoutMs` elapses. The + * built-in helpers (`portResponding`, `isHealthy`) listen to this + * signal via `waitForPort` / `waitForPath`, so they stop polling + * and reject promptly. + * + * 2. **Hard timeout**: the method also races `Promise.all` against a + * timeout promise that rejects after `timeoutMs`. This guarantees + * `startAndWaitForPorts` returns on time even if a user-defined + * check ignores the abort signal. Orphaned check promises may keep + * executing in the background until they finish on their own — + * user-defined checks that do long work should honour + * `options.signal` to avoid leaking resources. + */ + private async runReadinessChecks( + checks: ReadinessCheck[], + options: { signal?: AbortSignal; timeoutMs?: number } = {} + ): Promise { + if (checks.length === 0) { + return; + } + + const { timeoutMs } = options; + const signal = + timeoutMs !== undefined ? addTimeoutSignal(options.signal, timeoutMs) : options.signal; + + const allChecks = Promise.all(checks.map(check => check(this, { signal }))); + + if (timeoutMs === undefined) { + await allChecks; + return; + } + + let timeoutId: ReturnType | undefined; + const timeoutReject = new Promise((_, reject) => { + timeoutId = setTimeout(() => { + reject(new Error(`Readiness checks did not complete within ${timeoutMs}ms`)); + }, timeoutMs); + }); + + try { + await Promise.race([allChecks, timeoutReject]); + } finally { + if (timeoutId !== undefined) clearTimeout(timeoutId); + } } // ===========================================