feat!: Server ID automated using Redis by rafaelromcar-parabol · Pull Request #12509 · ParabolInc/parabol

rafaelromcar-parabol · 2026-01-22T13:15:14Z

Dynamic SERVER_ID Allocation Walkthrough

Implements an autonomous SERVER_ID allocation system using Redis, eliminating the need for hardcoded environment variables.

This was developed using AI. I just gave instructions and made it fix the problems I saw. You might just consider this trash and implement it another way, but this is working right now in https://redis-serverid.parabol.fun/

Changes

1. ServerIdentityManager

Created packages/server/utils/ServerIdentityManager.ts.

Claim: Iterates 0-1023, attempting SET server:id:N ... NX EX 60 to claim a slot.
Heartbeat: Refreshes the TTL every 20 seconds.
Recovery: Releases the ID on SIGTERM / SIGINT. ID expires automatically if process crashes (SIGKILL).

2. Resilience Strategy

Redis Downtime: If the heartbeat fails to contact Redis (e.g. network partition), the process retains its local ID. It does not panic.
Reconnection: It continues attempting to refresh the lease every 20s.
Lock Loss: If Redis was down for >60s and the key expired, the heartbeat logic attempts to re-claim the lock immediately upon reconnection (SET ... NX), ensuring split-brain protection where possible while favoring availability.

3. Bootstrapping

Wrapped application entry points to ensure SERVER_ID is claimed before the application starts.

Design Decision (Why separate files?)

JavaScript import statements are hoisted and executed before any other code in the file. If we added the claim logic directly to server.ts, the imports (like ./initLogging) would run first, seeing process.env.SERVER_ID as undefined.
By using a separate bootstrap.ts wrapper, we can await the claiming process and then use require() to lazily load the main application, ensuring the environment is fully prepared.

Server: packages/server/bootstrap.ts (wraps server.ts)
Embedder: packages/embedder/bootstrap.ts (wraps embedder.ts)
PreDeploy: scripts/toolboxSrc/bootstrapPreDeploy.ts (wraps preDeploy.ts)
Webpack: Updated prod.servers.config.js and dev.servers.config.js to use these bootstrap files as entry points.

3. Cleanup

Removed instance_var and hardcoded SERVER_ID from pm2.config.js and pm2.dev.config.js.

Verification Results

I created and ran scripts/verifyServerID.ts to validate the implementation.

Race Condition Test

Spawned 2 concurrent workers.

Result: Worker 1 claimed ID 0, Worker 2 claimed ID 1.
Pass: IDs were unique.

Lifecycle Test

Graceful Shutdown (SIGTERM):
- Worker claimed ID.
- Sent SIGTERM.
- Verified Redis key was deleted.
- Pass.
Crash (SIGKILL):
- Worker claimed ID.
- Sent SIGKILL.
- Verified Redis key persisted.
- Checked TTL: 60s.
- Pass.

FAQ

1. What happens if two servers start at the exact same time?

Redis guarantees atomicity. The SET ... NX (Not Exists) command is atomic. Even if two requests arrive simultaneously, Redis processes one first. The first returns OK (success), the second returns null (fail). The loser will simply try the next ID.

2. How does a server know it owns an ID?

Each server generates a unique instanceId (UUID) on startup. This UUID is stored as the value of the Redis key. When sending a heartbeat, the server checks if the value in Redis matches its own instanceId. If it matches, it extends the TTL.

3. How can I know which ID is assigned to which server?

Logs: The server logs Successfully claimed Server ID: X (Instance: UUID).
Redis: You can list keys with redis-cli keys server:id:*. The value of each key is the instanceId UUID. You can matches this UUID with the logs.

4. What happens if Redis goes down and comes back?

Detailed Retry Process:
- The server attempts a heartbeat every 20 seconds.
- ioredis automatically handles TCP connection retries in the background.
- If a heartbeat fails (exception caught), it logs an error and waits for the next 20s tick. It does not crash.
Upon Reconnection:
1. The next heartbeat tick reads the key from Redis.
2. If the key matches its UUID, it resumes refreshing.
3. If the key is missing (expired), it attempts to re-claim it immediately.
4. Log: On success, it logs: Successfully reclaimed Server ID: X (Instance: UUID).
5. If the key belongs to someone else (another server took it), it logs a FATAL error and initiates a Graceful Shutdown (SIGTERM). This forces the process to restart and claim a new, valid ID, guaranteeing no two servers share an ID.

…red SERVER_ID variable.

… another process with the same id.

…rocess variable.

Dschoordsch

Did you do a self-review before? At the very least the noise comments of the AI thinking process should be cleaned up before requesting a review.

Dschoordsch · 2026-01-22T15:03:12Z

packages/server/generateUID.ts

 const BIG_ZERO = BigInt(0)
 export const MAX_SEQ = 2 ** SEQ_BIT_LEN - 1

-// if MID overflows, we will generate duplicate ids, throw instead


-1 please keep this comment

Why? ids are now generated automatically using a MAX_ID=1023 (it's 1024 in the current implementation, but I'm changing that because we go from 0 to MAX_ID). That MAX_ID is in the ServerIdentityManager.ts file.

Because it's the only place where it's documented why we restrict the ID range. How are we supposed to know this down the line?
It's not a good place here, but better than losing that knowledge and deducting it from the code again. Maybe move the comment to const MAX_ID=1023 then.

Dschoordsch · 2026-01-23T05:55:57Z

packages/server/package.json

    "@hocuspocus/transformer": "3.2.3",
    "@mattkrick/sanitize-svg": "0.4.1",
    "@octokit/graphql-schema": "^10.36.0",
+    "@pgtyped/runtime": "^2.4.2",


Precommit was failing locally. AI added that alongside other changes and precommit worked.

Well, it's wrong. You're not using anything postgres related in this PR and the error message of the failed job indicated that you just got the redis.set call wrong.

Dschoordsch · 2026-01-23T06:02:36Z

scripts/verifyServerID.ts

+    setInterval(() => {
+      // Heartbeat logic is internal to identityManager
+    }, 1000)


-1 Why? It's a noop

Dschoordsch · 2026-01-23T06:05:22Z

scripts/verifyServerID.ts

+  console.log('\n--- Starting Lifecycle/TTL Test ---')
+
+  // Start a worker
+  const p = spawn(CMD, [...ARGS, 'worker'], {stdio: ['ignore', 'pipe', 'pipe']})


-1 maybe give it a better name than p?

Dschoordsch · 2026-01-23T06:17:00Z