Skip to content

SIGSEGV (exit 139) crash loop on Docker deployment — Render.com #149

@devli13

Description

@devli13

Summary

drift-gateway crashes with SIGSEGV (exit code 139) within 20–60 seconds of startup on every attempt. The crash occurs during or shortly after market data subscription, after the HTTP port is bound and the process is marked available. This is a consistent, reproducible crash loop that has made the gateway completely unusable for our production trading terminal.

Environment

Component Value
Gateway version master (0f388fc) and v1.5.5 tag (056c890) — both crash identically
Dockerfile Stock ./Dockerfile from this repo (builds from source)
Rust base image rust:1.85 (as specified in Dockerfile)
Runtime image debian:12 (as specified in Dockerfile)
libdrift_ffi_sys.so v2.156.3 (latest release, downloaded at build time per Dockerfile)
drift-rs Pinned at rev 0618036 (from Cargo.toml)
Hosting Render.com, Docker runtime, Oregon region
Render plan Standard (2GB RAM, 1 CPU) — also tested on Starter (512MB)
RPC Triton (paid, dedicated endpoint) — confirmed healthy via getHealth
OS architecture Render does not expose --platform setting for Docker builds

Startup Command

drift-gateway $TRITON_RPC_URL --host 0.0.0.0 --port 8080 --ws-port 18080

Environment variables set:

  • DRIFT_GATEWAY_KEY — base58 seed (valid, was working previously)
  • TRITON_RPC_URL — paid Triton RPC endpoint (healthy, returns {"result":"ok"} on getHealth)
  • INIT_RPC_THROTTLE=2
  • RUST_LOG=info

Crash Behavior

The gateway starts, binds its HTTP port, and Render marks it as "available." Then it crashes with exit code 139 (SIGSEGV) within 20–60 seconds. Every single restart follows the same pattern:

2026-03-02T02:07:15Z  deploy_ended (succeeded)
2026-03-02T02:07:17Z  server_available
2026-03-02T02:07:37Z  server_failed  exit=139   ← 20 seconds

Extended crash log (spans ~2 hours of attempts):

23:37:46Z  server_failed  exit=139    (first crash after 18h uptime)
23:38:12Z  server_available
23:38:24Z  server_failed  exit=139    ← 12s
23:39:03Z  server_available
23:39:18Z  server_failed  exit=139    ← 15s
...repeats every 5–60 seconds indefinitely...

What We Tested

Test Result
Master HEAD (0f388fc — websocket error handling commit) SIGSEGV exit 139
v1.5.5 tag (056c890 — clean release, before websocket commit) SIGSEGV exit 139
Starter plan (512MB RAM, 0.5 CPU) Crash in ~5-15s
Standard plan (2GB RAM, 1 CPU) Crash in ~20-60s
--markets sol-perp,btc-perp,eth-perp (3 markets only) Crash in ~60s
No --markets flag (all markets) Crash in ~20s
INIT_RPC_THROTTLE=2 No improvement
Clean rebuild with clearCache No improvement
Different --port and --ws-port values No improvement

What Previously Worked

The exact same Docker image (master at 0f388fc) ran successfully for 18 consecutive hours on March 1, 2026 (05:28 UTC → 23:37 UTC). The crash loop started spontaneously without any configuration or code change.

This suggests either:

  1. Something changed on-chain in the Drift protocol that the FFI library cannot parse without segfaulting
  2. An architecture/platform issue — the README warns that --platform linux/x86_64 is required for correct Solana program data memory layout, but Render.com does not expose Docker platform settings

Suspected Root Cause

The README states:

--platform linux/x86_64 ensures the correct memory layout at runtime for solana program data types

The Dockerfile does not specify a target platform. On hosting providers that may run Docker on ARM (aarch64) or mixed-architecture fleets, the libdrift_ffi_sys.so (compiled for x86_64) would produce corrupt memory layouts when parsing Solana account data, leading to SIGSEGV.

Alternatively, libdrift_ffi_sys.so v2.156.3 may have an incompatibility with current on-chain Drift state that was not present when the binary was first deployed.

Questions for Maintainers

  1. Is there a known incompatibility between libdrift_ffi_sys v2.156.3 and current Drift on-chain state?
  2. Should the Dockerfile explicitly set --platform linux/amd64 in the FROM directives to prevent architecture mismatches?
  3. Would using the prebuilt ghcr.io/drift-labs/gateway image avoid this issue?
  4. Are there any recent Drift protocol upgrades (new markets, account layout changes) that could trigger parsing crashes in the FFI?

Impact

Our production trading terminal (https://terminal-ui-virid.vercel.app) depends on the gateway for market data, positions, account info, and order placement. The gateway crash loop has taken down all backend-dependent functionality.

Reproduction Steps

  1. Clone this repo
  2. Deploy the stock Dockerfile to Render.com (or any Docker host)
  3. Set DRIFT_GATEWAY_KEY to a valid base58 seed for a mainnet Drift account
  4. Set RPC URL to any mainnet RPC endpoint
  5. Start with: drift-gateway <RPC_URL> --host 0.0.0.0 --port 8080
  6. Observe: gateway binds port, then crashes with SIGSEGV within 20–60 seconds

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions