-
Notifications
You must be signed in to change notification settings - Fork 24
SIGSEGV (exit 139) crash loop on Docker deployment — Render.com #149
Description
Summary
drift-gateway crashes with SIGSEGV (exit code 139) within 20–60 seconds of startup on every attempt. The crash occurs during or shortly after market data subscription, after the HTTP port is bound and the process is marked available. This is a consistent, reproducible crash loop that has made the gateway completely unusable for our production trading terminal.
Environment
| Component | Value |
|---|---|
| Gateway version | master (0f388fc) and v1.5.5 tag (056c890) — both crash identically |
| Dockerfile | Stock ./Dockerfile from this repo (builds from source) |
| Rust base image | rust:1.85 (as specified in Dockerfile) |
| Runtime image | debian:12 (as specified in Dockerfile) |
| libdrift_ffi_sys.so | v2.156.3 (latest release, downloaded at build time per Dockerfile) |
| drift-rs | Pinned at rev 0618036 (from Cargo.toml) |
| Hosting | Render.com, Docker runtime, Oregon region |
| Render plan | Standard (2GB RAM, 1 CPU) — also tested on Starter (512MB) |
| RPC | Triton (paid, dedicated endpoint) — confirmed healthy via getHealth |
| OS architecture | Render does not expose --platform setting for Docker builds |
Startup Command
drift-gateway $TRITON_RPC_URL --host 0.0.0.0 --port 8080 --ws-port 18080Environment variables set:
DRIFT_GATEWAY_KEY— base58 seed (valid, was working previously)TRITON_RPC_URL— paid Triton RPC endpoint (healthy, returns{"result":"ok"}ongetHealth)INIT_RPC_THROTTLE=2RUST_LOG=info
Crash Behavior
The gateway starts, binds its HTTP port, and Render marks it as "available." Then it crashes with exit code 139 (SIGSEGV) within 20–60 seconds. Every single restart follows the same pattern:
2026-03-02T02:07:15Z deploy_ended (succeeded)
2026-03-02T02:07:17Z server_available
2026-03-02T02:07:37Z server_failed exit=139 ← 20 seconds
Extended crash log (spans ~2 hours of attempts):
23:37:46Z server_failed exit=139 (first crash after 18h uptime)
23:38:12Z server_available
23:38:24Z server_failed exit=139 ← 12s
23:39:03Z server_available
23:39:18Z server_failed exit=139 ← 15s
...repeats every 5–60 seconds indefinitely...
What We Tested
| Test | Result |
|---|---|
Master HEAD (0f388fc — websocket error handling commit) |
SIGSEGV exit 139 |
v1.5.5 tag (056c890 — clean release, before websocket commit) |
SIGSEGV exit 139 |
| Starter plan (512MB RAM, 0.5 CPU) | Crash in ~5-15s |
| Standard plan (2GB RAM, 1 CPU) | Crash in ~20-60s |
--markets sol-perp,btc-perp,eth-perp (3 markets only) |
Crash in ~60s |
No --markets flag (all markets) |
Crash in ~20s |
INIT_RPC_THROTTLE=2 |
No improvement |
Clean rebuild with clearCache |
No improvement |
Different --port and --ws-port values |
No improvement |
What Previously Worked
The exact same Docker image (master at 0f388fc) ran successfully for 18 consecutive hours on March 1, 2026 (05:28 UTC → 23:37 UTC). The crash loop started spontaneously without any configuration or code change.
This suggests either:
- Something changed on-chain in the Drift protocol that the FFI library cannot parse without segfaulting
- An architecture/platform issue — the README warns that
--platform linux/x86_64is required for correct Solana program data memory layout, but Render.com does not expose Docker platform settings
Suspected Root Cause
The README states:
--platform linux/x86_64ensures the correct memory layout at runtime for solana program data types
The Dockerfile does not specify a target platform. On hosting providers that may run Docker on ARM (aarch64) or mixed-architecture fleets, the libdrift_ffi_sys.so (compiled for x86_64) would produce corrupt memory layouts when parsing Solana account data, leading to SIGSEGV.
Alternatively, libdrift_ffi_sys.so v2.156.3 may have an incompatibility with current on-chain Drift state that was not present when the binary was first deployed.
Questions for Maintainers
- Is there a known incompatibility between
libdrift_ffi_sysv2.156.3 and current Drift on-chain state? - Should the Dockerfile explicitly set
--platform linux/amd64in theFROMdirectives to prevent architecture mismatches? - Would using the prebuilt
ghcr.io/drift-labs/gatewayimage avoid this issue? - Are there any recent Drift protocol upgrades (new markets, account layout changes) that could trigger parsing crashes in the FFI?
Impact
Our production trading terminal (https://terminal-ui-virid.vercel.app) depends on the gateway for market data, positions, account info, and order placement. The gateway crash loop has taken down all backend-dependent functionality.
Reproduction Steps
- Clone this repo
- Deploy the stock Dockerfile to Render.com (or any Docker host)
- Set
DRIFT_GATEWAY_KEYto a valid base58 seed for a mainnet Drift account - Set RPC URL to any mainnet RPC endpoint
- Start with:
drift-gateway <RPC_URL> --host 0.0.0.0 --port 8080 - Observe: gateway binds port, then crashes with SIGSEGV within 20–60 seconds