Unexpected Pod Restarts Caused by Unresponsive Node.js Process (MCP / OAuth) #12078

jannickHo · 2026-03-05T08:11:18Z

jannickHo
Mar 5, 2026

Observed Behavior

Our LibreChat instance running in Kubernetes has been restarting repeatedly — 2 times on the day of reporting, 7 times the day before. Concurrent user load at the time was approximately 10 users.

The restarts were forced by Kubernetes: the liveness probe (10s timeout, 5 retries) detected that the health endpoint had stopped responding. Graceful shutdown did not complete successfully, so Kubernetes escalated to a hard SIGTERM. This indicates the Node.js process had either crashed or become fully unresponsive before the SIGTERM was sent.

In 3 out of 4 restarts, the last log line before the restart was an MCP tool call timeout:

McpError -32001: Request timed out

The MCP servers in use are OAuth-authenticated and use HTTP streaming connections with a 30-second server-side timeout. This causes frequent reconnects — roughly every 30 seconds per connected user.

Environment:

Node.js 20 (via node:20-alpine)
No Redis — in-memory cache only
MCP servers with OAuth over HTTP streaming
Helm Chart Version: 1.9.7
Librechat Version: v0.8.2
~10 concurrent users at time of crashes

Claude Analysis

The following is a suspected root cause based on static code analysis. It has not been confirmed by reproducing the crash.

Node.js 15+ terminates the process by default when a Promise rejection goes unhandled. The codebase registers an uncaughtException handler but no unhandledRejection handler, leaving this default behavior in place.

Several places in the MCP and OAuth reconnection code appear to call async functions in a fire-and-forget pattern — without await and without .catch(). If any of those async operations throw internally, the resulting rejection has no handler. Given that these code paths are triggered exactly during MCP timeouts and OAuth reconnect cycles (which happen frequently with 30s HTTP Streaming connections), it is plausible that an unhandled rejection from one of these calls is what causes the process to stop responding and eventually be killed by the liveness probe.

Reproduction Update

We were able to reproduce the crash with the following configurations (MCP server itself unchanged throughout):

LibreChat MCP timeout config	Result
30,000 ms (matching server timeout)	Timeout log messages + crash
20,000 ms (below server timeout)	Timeout log messages + crash
50,000 ms (above server timeout)	No timeout log messages, but crash still occurs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unexpected Pod Restarts Caused by Unresponsive Node.js Process (MCP / OAuth) #12078

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Uh oh!

Unexpected Pod Restarts Caused by Unresponsive Node.js Process (MCP / OAuth) #12078

Uh oh!

Uh oh!

jannickHo Mar 5, 2026

Observed Behavior

Claude Analysis

Reproduction Update

Replies: 0 comments

jannickHo
Mar 5, 2026