SIGSEGV (exit 139) crash loop on Docker deployment — Render.com

## Summary

drift-gateway crashes with **SIGSEGV (exit code 139)** within 20–60 seconds of startup on every attempt. The crash occurs during or shortly after market data subscription, after the HTTP port is bound and the process is marked available. This is a consistent, reproducible crash loop that has made the gateway completely unusable for our production trading terminal.

## Environment

| Component | Value |
|---|---|
| **Gateway version** | master (`0f388fc`) and v1.5.5 tag (`056c890`) — both crash identically |
| **Dockerfile** | Stock `./Dockerfile` from this repo (builds from source) |
| **Rust base image** | `rust:1.85` (as specified in Dockerfile) |
| **Runtime image** | `debian:12` (as specified in Dockerfile) |
| **libdrift_ffi_sys.so** | v2.156.3 (latest release, downloaded at build time per Dockerfile) |
| **drift-rs** | Pinned at rev `0618036` (from Cargo.toml) |
| **Hosting** | Render.com, Docker runtime, Oregon region |
| **Render plan** | Standard (2GB RAM, 1 CPU) — also tested on Starter (512MB) |
| **RPC** | Triton (paid, dedicated endpoint) — confirmed healthy via `getHealth` |
| **OS architecture** | Render does not expose `--platform` setting for Docker builds |

## Startup Command

```bash
drift-gateway $TRITON_RPC_URL --host 0.0.0.0 --port 8080 --ws-port 18080
```

Environment variables set:
- `DRIFT_GATEWAY_KEY` — base58 seed (valid, was working previously)
- `TRITON_RPC_URL` — paid Triton RPC endpoint (healthy, returns `{"result":"ok"}` on `getHealth`)
- `INIT_RPC_THROTTLE=2`
- `RUST_LOG=info`

## Crash Behavior

The gateway starts, binds its HTTP port, and Render marks it as "available." Then it crashes with exit code 139 (SIGSEGV) within 20–60 seconds. Every single restart follows the same pattern:

```
2026-03-02T02:07:15Z  deploy_ended (succeeded)
2026-03-02T02:07:17Z  server_available
2026-03-02T02:07:37Z  server_failed  exit=139   ← 20 seconds
```

Extended crash log (spans ~2 hours of attempts):
```
23:37:46Z  server_failed  exit=139    (first crash after 18h uptime)
23:38:12Z  server_available
23:38:24Z  server_failed  exit=139    ← 12s
23:39:03Z  server_available
23:39:18Z  server_failed  exit=139    ← 15s
...repeats every 5–60 seconds indefinitely...
```

## What We Tested

| Test | Result |
|---|---|
| Master HEAD (`0f388fc` — websocket error handling commit) | SIGSEGV exit 139 |
| v1.5.5 tag (`056c890` — clean release, before websocket commit) | SIGSEGV exit 139 |
| Starter plan (512MB RAM, 0.5 CPU) | Crash in ~5-15s |
| Standard plan (2GB RAM, 1 CPU) | Crash in ~20-60s |
| `--markets sol-perp,btc-perp,eth-perp` (3 markets only) | Crash in ~60s |
| No `--markets` flag (all markets) | Crash in ~20s |
| `INIT_RPC_THROTTLE=2` | No improvement |
| Clean rebuild with `clearCache` | No improvement |
| Different `--port` and `--ws-port` values | No improvement |

## What Previously Worked

The **exact same Docker image** (master at `0f388fc`) ran successfully for **18 consecutive hours** on March 1, 2026 (05:28 UTC → 23:37 UTC). The crash loop started spontaneously without any configuration or code change.

This suggests either:
1. Something changed on-chain in the Drift protocol that the FFI library cannot parse without segfaulting
2. An architecture/platform issue — the README warns that `--platform linux/x86_64` is required for correct Solana program data memory layout, but Render.com does not expose Docker platform settings

## Suspected Root Cause

The README states:

> `--platform linux/x86_64` ensures the correct memory layout at runtime for solana program data types

The `Dockerfile` does not specify a target platform. On hosting providers that may run Docker on ARM (aarch64) or mixed-architecture fleets, the `libdrift_ffi_sys.so` (compiled for x86_64) would produce corrupt memory layouts when parsing Solana account data, leading to SIGSEGV.

Alternatively, `libdrift_ffi_sys.so` v2.156.3 may have an incompatibility with current on-chain Drift state that was not present when the binary was first deployed.

## Questions for Maintainers

1. Is there a known incompatibility between `libdrift_ffi_sys` v2.156.3 and current Drift on-chain state?
2. Should the Dockerfile explicitly set `--platform linux/amd64` in the `FROM` directives to prevent architecture mismatches?
3. Would using the prebuilt `ghcr.io/drift-labs/gateway` image avoid this issue?
4. Are there any recent Drift protocol upgrades (new markets, account layout changes) that could trigger parsing crashes in the FFI?

## Impact

Our production trading terminal (https://terminal-ui-virid.vercel.app) depends on the gateway for market data, positions, account info, and order placement. The gateway crash loop has taken down all backend-dependent functionality.

## Reproduction Steps

1. Clone this repo
2. Deploy the stock Dockerfile to Render.com (or any Docker host)
3. Set `DRIFT_GATEWAY_KEY` to a valid base58 seed for a mainnet Drift account
4. Set RPC URL to any mainnet RPC endpoint
5. Start with: `drift-gateway <RPC_URL> --host 0.0.0.0 --port 8080`
6. Observe: gateway binds port, then crashes with SIGSEGV within 20–60 seconds

Component	Value
Gateway version	master (`0f388fc`) and v1.5.5 tag (`056c890`) — both crash identically
Dockerfile	Stock `./Dockerfile` from this repo (builds from source)
Rust base image	`rust:1.85` (as specified in Dockerfile)
Runtime image	`debian:12` (as specified in Dockerfile)
libdrift_ffi_sys.so	v2.156.3 (latest release, downloaded at build time per Dockerfile)
drift-rs	Pinned at rev `0618036` (from Cargo.toml)
Hosting	Render.com, Docker runtime, Oregon region
Render plan	Standard (2GB RAM, 1 CPU) — also tested on Starter (512MB)
RPC	Triton (paid, dedicated endpoint) — confirmed healthy via `getHealth`
OS architecture	Render does not expose `--platform` setting for Docker builds

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SIGSEGV (exit 139) crash loop on Docker deployment — Render.com #149

Summary

Environment

Startup Command

Crash Behavior

What We Tested

What Previously Worked

Suspected Root Cause

Questions for Maintainers

Impact

Reproduction Steps

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Test	Result
Master HEAD (`0f388fc` — websocket error handling commit)	SIGSEGV exit 139
v1.5.5 tag (`056c890` — clean release, before websocket commit)	SIGSEGV exit 139
Starter plan (512MB RAM, 0.5 CPU)	Crash in ~5-15s
Standard plan (2GB RAM, 1 CPU)	Crash in ~20-60s
`--markets sol-perp,btc-perp,eth-perp` (3 markets only)	Crash in ~60s
No `--markets` flag (all markets)	Crash in ~20s
`INIT_RPC_THROTTLE=2`	No improvement
Clean rebuild with `clearCache`	No improvement
Different `--port` and `--ws-port` values	No improvement

SIGSEGV (exit 139) crash loop on Docker deployment — Render.com #149

Description

Summary

Environment

Startup Command

Crash Behavior

What We Tested

What Previously Worked

Suspected Root Cause

Questions for Maintainers

Impact

Reproduction Steps

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions