Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions .changeset/alarm-scheduler-spinloop.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
---
'@cloudflare/containers': patch
---

Fix a spinloop in the alarm scheduler that could saturate the Durable Object event loop and cause WebSocket upgrades to be canceled at 0ms wallclock while HTTP traffic to the same DO continued to succeed. The `alarm()` handler previously paired an in-memory `setTimeout` sleep with an unconditional `setAlarm(Date.now())` on exit. Any external call to `scheduleNextAlarm()` during the sleep resolved the internal Promise, and the handler's exit path would then overwrite the caller's future alarm with one scheduled for "now" — causing the runtime to refire the alarm immediately. Under load (for example, a `startAndWaitForPorts` retry loop or partysocket reconnect storm), this escalated into a ~300ms alarm cadence matching `INSTANCE_POLL_INTERVAL_MS`.

The handler is now durable-by-default: it completes its work and re-arms the storage alarm to the earliest of the next scheduled task, `sleepAfter` expiration, or a 3-minute heartbeat, floored at 100ms. `scheduleNextAlarm()` is idempotent — concurrent callers converge on the earliest requested time instead of clobbering each other via the removed in-memory Promise/timeout coordination.

No behavior change for activity renewal, connection handling, or `onStart`/`onStop` lifecycle hooks.
10 changes: 7 additions & 3 deletions jest.config.js
Original file line number Diff line number Diff line change
@@ -1,13 +1,17 @@
module.exports = {
preset: 'ts-jest',
testEnvironment: 'node',
testMatch: ['**/tests/**/*.test.ts'],
testMatch: ['**/src/tests/**/*.test.ts'],
moduleFileExtensions: ['ts', 'js', 'json'],
moduleNameMapper: {
'^cloudflare:workers$': '<rootDir>/src/tests/__mocks__/cloudflare-workers.ts'
},
transform: {
'^.+\\.ts$': ['ts-jest', { tsconfig: 'tsconfig.json' }]
},
clearMocks: true,
collectCoverage: true,
coverageDirectory: 'coverage',
coverageReporters: ['text', 'lcov'],
collectCoverageFrom: ['src/**/*.ts', '!src/**/*.d.ts']
};
collectCoverageFrom: ['src/**/*.ts', '!src/**/*.d.ts', '!src/tests/**']
};
56 changes: 13 additions & 43 deletions src/lib/container.ts
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,9 @@ const OUTBOUND_CONFIGURATION_KEY = 'OUTBOUND_CONFIGURATION';
const MAX_ALARM_RETRIES = 3;
const PING_TIMEOUT_MS = 5000;

const MIN_ALARM_REARM_MS = 100; // Floor for alarm re-arm times
const MAX_ALARM_REARM_MS = 3 * 60 * 1000; // Default heartbeat

const DEFAULT_SLEEP_AFTER = '10m'; // Default sleep after inactivity time
const INSTANCE_POLL_INTERVAL_MS = 300; // Default interval for polling container state

Expand Down Expand Up @@ -1842,10 +1845,6 @@ export class Container<Env = Cloudflare.Env> extends DurableObject<Env> {
})
.finally(() => {
this.monitorSetup = false;
if (this.timeout) {
if (this.resolve) this.resolve();
clearTimeout(this.timeout);
}
});
}

Expand Down Expand Up @@ -1877,14 +1876,6 @@ export class Container<Env = Cloudflare.Env> extends DurableObject<Env> {
return;
}

// do not remove this, container DOs ALWAYS need an alarm right now.
// The only way for this DO to stop having alarms is:
// 1. The container is not running anymore.
// 2. Activity expired and it exits.
const prevAlarm = Date.now();
await this.ctx.storage.setAlarm(prevAlarm);
await this.ctx.storage.sync();

// Get all schedules that should be executed now
const result = this.sql<{
id: string;
Expand All @@ -1895,7 +1886,7 @@ export class Container<Env = Cloudflare.Env> extends DurableObject<Env> {
}>`
SELECT * FROM container_schedules;
`;
let minTime = Date.now() + 3 * 60 * 1000;
let minTime = Date.now() + MAX_ALARM_REARM_MS;

const now = Date.now() / 1000;
// Process each due schedule
Expand Down Expand Up @@ -1956,36 +1947,16 @@ export class Container<Env = Cloudflare.Env> extends DurableObject<Env> {
await this.onActivityExpired();
// renewActivityTimeout makes sure we don't spam calls here
this.renewActivityTimeout();
await this.ctx.storage.setAlarm(Date.now() + MIN_ALARM_REARM_MS);
return;
}

// Math.min(3m or maxTime, sleepTimeout)
minTime = Math.min(minTimeFromSchedules, minTime, this.sleepAfterMs);
const timeout = Math.max(0, minTime - Date.now());

// await a sleep for maxTime to keep the DO alive for
// at least this long
await new Promise<void>(resolve => {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now the Durable Object being active keeps the container lifetime. I am curious to see if we remove this, we won't be able to keep the container alive for an amount of time specified by sleepAfter.

this.resolve = resolve;
if (!this.container.running) {
resolve();
return;
}

this.timeout = setTimeout(() => {
resolve();
}, timeout);
});

await this.ctx.storage.setAlarm(Date.now());

// we exit and we have another alarm,
// the next alarm is the one that decides if it should stop the loop.
const nextAlarm = Math.max(minTime, Date.now() + MIN_ALARM_REARM_MS);
await this.ctx.storage.setAlarm(nextAlarm);
}

timeout?: ReturnType<typeof setTimeout>;
resolve?: () => void;

// synchronises container state with the container source of truth to process events
private async syncPendingStoppedEvents() {
const state = await this.state.getState();
Expand Down Expand Up @@ -2019,15 +1990,14 @@ export class Container<Env = Cloudflare.Env> extends DurableObject<Env> {
}

/**
* Schedule the next alarm based on upcoming tasks
* Schedule the next alarm based on upcoming tasks. Idempotent — no-ops
* if an alarm is already set to fire sooner than the requested time.
*/
public async scheduleNextAlarm(ms = 1000): Promise<void> {
const nextTime = ms + Date.now();

// if not already set
if (this.timeout) {
if (this.resolve) this.resolve();
clearTimeout(this.timeout);
const nextTime = Date.now() + Math.max(ms, MIN_ALARM_REARM_MS);
const existing = await this.ctx.storage.getAlarm();
if (existing !== null && existing <= nextTime) {
return;
}

await this.ctx.storage.setAlarm(nextTime);
Expand Down
24 changes: 24 additions & 0 deletions src/tests/__mocks__/cloudflare-workers.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
/**
* Jest mock for the `cloudflare:workers` virtual module. Provides stub
* implementations of DurableObject and WorkerEntrypoint good enough to
* let Container / ContainerProxy class bodies compile and run under
* Node + ts-jest without a real workerd runtime.
*/

export class DurableObject<Env = unknown> {
ctx: unknown;
env: Env;
constructor(ctx: unknown, env: Env) {
this.ctx = ctx;
this.env = env;
}
}

export class WorkerEntrypoint<Env = unknown, _Props = unknown> {
ctx: unknown;
env: Env;
constructor(ctx: unknown, env: Env) {
this.ctx = ctx;
this.env = env;
}
}
Loading