Improve proxy mount options by djeebus · Pull Request #2286 · e2b-dev/infra

djeebus · 2026-04-01T23:20:31Z

Seems to improve things about 5-10%

cursor · 2026-04-01T23:20:38Z

PR Summary

Medium Risk
Changes NFS mount flags used during sandbox init, which can affect mount success, I/O semantics, and performance under different kernels/proxy behavior.

Overview
Adjusts sandbox volume mounting to use a centralized nfsOptions string and updates the NFS mount command to include synchronous operation plus explicit 1MB rsize/wsize and TCP/port/version constraints for the proxy.

^{Written by Cursor Bugbot for commit 10c5309. This will update automatically on new commits. Configure here.}

packages/envd/internal/api/init.go

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

cursor · 2026-04-01T23:34:56Z

packages/envd/internal/api/init.go

+var nfsOptions = strings.Join([]string{
+	// wait for data to be sent to proxy server before returning.
+	// async might cause issues if the sandbox is shut down suddenly.
+	"sync",


NFS sync option not removed despite PR discussion agreement

Medium Severity

The sync NFS mount option appears to still be present despite the author agreeing in the PR discussion to remove it ("Despite the performance gains, I'll remove that one"). The sync option forces synchronous writes, which can significantly degrade NFS throughput (2-3x slower than the default async), partially negating the performance gains from the rsize/wsize buffer increases. This contradicts the PR's stated goal of improving performance by 5-10%.

lol.

claude, locally: "we're missing async, that'll make it faster"

claude, in pr: "async will cause data issues, use sync"

cursor, in pr: "you still have sync, I thought you wanted to switch to async"

data corruption overrules speed, we're done here.

claude · 2026-04-02T00:10:25Z

packages/envd/internal/api/init.go

+var nfsOptions = strings.Join([]string{
+	// wait for data to be sent to proxy server before returning.
+	// async might cause issues if the sandbox is shut down suddenly.
+	"sync",
+
+	"rsize=1048576",  // 1 MB read buffer
+	"wsize=1048576",  // 1 MB write buffer
+	"mountproto=tcp", // nfs proxy only supports tcp
+	"mountport=2049", // nfs proxy only supports mounting on port 2049
+	"proto=tcp",      // nfs proxy only supports tcp
+	"port=2049",      // nfs proxy only supports mounting on port 2049
+	"nfsvers=3",      // nfs proxy is nfs version 3
+	"noacl",          // no reason for acl in the sandbox
+}, ",")
+
 func (a *API) setupNfs(ctx context.Context, nfsTarget, path string) {
 	commands := [][]string{
 		{"mkdir", "-p", path},
-		{"mount", "-v", "-t", "nfs", "-o", "fg,hard,mountproto=tcp,mountport=2049,proto=tcp,port=2049,nfsvers=3,noacl", nfsTarget, path},
+		{"mount", "-v", "-t", "nfs", "-o", "fg,hard," + nfsOptions, nfsTarget, path},
 	}

 	for _, command := range commands {


🔴 This PR introduces sync on a hard-mounted NFS volume — a dangerous combination where every write blocks until the NFS proxy acknowledges it and hard means there is no timeout. If the NFS proxy becomes unreachable after the mount is established (crash, network partition, restart), any sandbox process writing to the volume will hang indefinitely in uninterruptible sleep (D state) with no way to kill it. A safer approach that still preserves write durability is sync + soft + timeo=N + retrans=M, which allows the kernel to eventually return an error to the application instead of blocking forever.

Extended reasoning...

Bug: hard + sync NFS mount creates indefinite I/O hang risk

What the bug is and how it manifests

Before this PR, the mount command used fg,hard,mountproto=tcp,... with no explicit async/sync flag, defaulting to NFS async mode. With async, writes complete as soon as they are buffered in the client page cache; the kernel retries delivery to the server in the background. This PR introduces an explicit sync option, which changes write semantics entirely: every write() syscall now blocks until the NFS proxy sends an acknowledgement. Combined with the pre-existing hard mount option (which means the client retries indefinitely and never returns an error to the application), any process performing write I/O on the mount point will block indefinitely if the NFS proxy becomes unavailable.

The specific code path

In packages/envd/internal/api/init.go (lines 235-256), nfsOptions is constructed with "sync" as the first entry. The setupNfs function then passes "fg,hard," + nfsOptions to the mount command. The hard flag was present before this PR; the sync flag is new. Together they produce a mount with both hard and sync active.

Why existing code does not prevent it

There is no timeout, watchdog, or sandbox-level health check that would detect and recover from a process stuck in D state. The NFS hard option is explicitly designed to prevent the kernel from ever returning EIO or ETIMEDOUT to the application. Adding sync on top means the kernel cannot complete the write optimistically by buffering; it must wait for the server. There is no timeo, retrans, or intr option to bound the wait.

Impact

If the NFS proxy crashes, is restarted, or a network partition occurs after a volume is mounted — all plausible in a sandbox environment — every sandbox process that subsequently attempts a write will enter uninterruptible sleep (D state). These processes cannot be killed with SIGKILL. The only recovery is a full VM termination. This turns a transient infrastructure hiccup into a permanent sandbox freeze.

Step-by-step proof of the hang

Sandbox starts; setupNfs mounts the proxy volume with fg,hard,sync,...

A sandbox user process opens a file on the mounted path and calls write().

The Linux NFS client sends a WRITE RPC to the proxy and blocks the process, waiting for the NFS ACK (sync requires server acknowledgment before returning to userspace).

At this moment, the NFS proxy crashes or becomes unreachable.

The NFS client retries the WRITE RPC repeatedly per the hard semantics — indefinitely, with no timeout.

The user process remains stuck in D state, unkillable. Any other process that subsequently writes to the same mount also gets stuck.

The only recovery is to terminate the VM.

How to fix it

Replace hard with soft and add explicit timeo and retrans values: e.g., sync,soft,timeo=100,retrans=3,rsize=1048576,wsize=1048576,.... With soft, after retrans failed retries the kernel returns EIO to the application instead of retrying forever. sync can be kept to preserve the data-durability intent. The soft + timeo + retrans combination is the most portable and operationally safe choice.

improve proxy mount options

f7ed0a7

e2b-request-same-site-reviewers bot assigned ValentaTomas Apr 1, 2026

claude bot reviewed Apr 1, 2026

View reviewed changes

packages/envd/internal/api/init.go Outdated Show resolved Hide resolved

clean up options

10c5309

cursor bot reviewed Apr 1, 2026

View reviewed changes

djeebus marked this pull request as ready for review April 1, 2026 23:50

djeebus requested review from ValentaTomas, dobrac and jakubno as code owners April 1, 2026 23:50

claude bot reviewed Apr 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve proxy mount options#2286

Improve proxy mount options#2286
djeebus wants to merge 2 commits intomainfrom
improve-proxy-mount-options

djeebus commented Apr 1, 2026

Uh oh!

cursor bot commented Apr 1, 2026 •

edited

Loading

Uh oh!

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Apr 1, 2026

Uh oh!

djeebus Apr 1, 2026

Uh oh!

claude bot Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

djeebus commented Apr 1, 2026

Uh oh!

cursor bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Apr 1, 2026

Choose a reason for hiding this comment

NFS sync option not removed despite PR discussion agreement

Uh oh!

djeebus Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot Apr 2, 2026

Choose a reason for hiding this comment

Bug: hard + sync NFS mount creates indefinite I/O hang risk

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cursor bot commented Apr 1, 2026 •

edited

Loading