Skip to content

Implement sandbox exec through command router#238

Merged
ehdr merged 10 commits intomainfrom
ehdr/sb-direct-exec
Dec 19, 2025
Merged

Implement sandbox exec through command router#238
ehdr merged 10 commits intomainfrom
ehdr/sb-direct-exec

Conversation

@ehdr
Copy link
Contributor

@ehdr ehdr commented Dec 16, 2025

Port modal-labs/modal-client@6133e091 / modal-labs/modal-client#3673 to add direct worker connections for Sandbox
exec operations, reducing latency and improving reliability.

The first commit bumps the modal-client submodule and refreshes protos, sorry for the noisy commit. Probably best to exclude that commit from the diff when revieweing.


Note

Switches Sandbox exec to the task command router with direct worker connections, adds client/config support, updates tests, and refreshes protos with related fields/messages.

  • JS Client/Sandbox:
    • Implement exec via TaskCommandRouterClientImpl (direct worker connections), replacing container-exec path.
    • Add validateExecArgs, deadline handling, PTY mapping, and binary/text I/O streaming via router.
    • New buildTaskExecStartRequestProto; update ContainerProcess I/O to use router.
    • Config: support task_command_router_insecure (env: MODAL_TASK_COMMAND_ROUTER_INSECURE).
    • Expose timeoutMiddleware; minor CloudBucketMount.toProto create() usage.
  • Protos (refresh):
    • Add HTTPConfig and fields to Function/FunctionData.
    • Enhance CloudBucketMount (force_path_style, metadata TTL oneof/enum).
    • Add AppCreateRequest.tags, AppGetLogsRequest.parametrized_function_id, CancelInputEvent.cancellation_reason.
    • Environment concurrency fields; Proxy region; add RPCRetryPolicy/RPCStatus.
    • Remove Sandbox router protos; extend Task command router (mount/snapshot dir).
  • Tests:
    • Update Sandbox exec tests to router path; add task command router client tests.
  • Changelog:
    • Note internal switch of Sandbox exec to new command router for performance/reliability.

Written by Cursor Bugbot for commit 4580ac9. This will update automatically on new commits. Configure here.

ehdr added 2 commits December 16, 2025 18:28
Port modal-labs/modal-client@6133e091 to add direct worker connections for Sandbox
exec operations, reducing latency and improving reliability.
Copy link

@saltzm saltzm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good overall! Main thing is I think it'd be good to add some more tests

stdoutConfig = TaskExecStdoutConfig.TASK_EXEC_STDOUT_CONFIG_DEVNULL;
} else {
throw new Error(`Unsupported stdout behavior: ${stdout}`);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the JS API not support "stdout" as an option?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in SandboxExecParams you can pass stdout, and it can be one of StdioBehavior = "pipe" | "ignore", does that make sense?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah - I meant stdout = "stdout" 😄 There's a third option in the python client: https://github.com/modal-labs/modal-client/blob/efe8b48bbd66ba204447de2f035614870dfb1a6a/modal/stream_type.py#L12

Not adamant we support it here, I don't think it's commonly used, though a couple customers do use it. It just prints everything from the exec stdout to local stdout

command: string[],
params?: SandboxExecParams & { mode?: "text" },
): Promise<ContainerProcess<string>>;
): Promise<ContainerProcess<string> | ContainerProcessThroughRouter<string>>;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a breaking API change? If so does it matter? If we're okay doing this, we could maybe just remove the old ContainerProcess object, since I don't think we're going back to the old exec implementation at this point. Fine doing that in stages though if we want to play it safe

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good, I'll remove the old implementation then!

),
);
if (stdout === "ignore") {
stdoutStream.cancel();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we just avoid constructing the above stdoutStream instead?

),
);
if (stderr === "ignore") {
stderrStream.cancel();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

execId,
offset,
new Uint8Array(0),
true,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

await expect(
callWithRetriesOnTransientErrors(func, 100, 2, null, deadline),
).rejects.toThrow("Deadline exceeded");
});
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to have testing parity with python client


try {
for await (const item of stream) {
numAuthRetries = 0;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just changed this to a bool in the python implementation: https://github.com/modal-labs/modal-client/pull/3828/changes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed here too now to match!

Copy link

@saltzm saltzm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few questions that weren't answered from the last review, but after that looks good to me

this.#taskId,
this.#execId,
this.#deadline,
);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noting that to match the python client we'd need to catch this deadline exceeded exception, but i actually like this behavior better of just throwing so think it makes sense to keep it

Copy link

@saltzm saltzm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One remaining question then looks good

stdoutConfig = TaskExecStdoutConfig.TASK_EXEC_STDOUT_CONFIG_DEVNULL;
} else {
throw new Error(`Unsupported stdout behavior: ${stdout}`);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah - I meant stdout = "stdout" 😄 There's a third option in the python client: https://github.com/modal-labs/modal-client/blob/efe8b48bbd66ba204447de2f035614870dfb1a6a/modal/stream_type.py#L12

Not adamant we support it here, I don't think it's commonly used, though a couple customers do use it. It just prints everything from the exec stdout to local stdout


const p = await sb.exec(["sleep", "999"], { timeoutMs: 1000 });
const t0 = Date.now();
const exitCode = await p.wait();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd expect this to throw based on the implementation? https://github.com/modal-labs/libmodal/pull/238/changes#diff-c6c555742ea2730b4a75182c097f0e07c70d86ec2af2b6f2a97bcb790c9777b7R96

Do you understand why it's not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah - I meant stdout = "stdout" 😄 There's a third option in the python client: https://github.com/modal-labs/modal-client/blob/efe8b48bbd66ba204447de2f035614870dfb1a6a/modal/stream_type.py#L12

Not adamant we support it here, I don't think it's commonly used, though a couple customers do use it. It just prints everything from the exec stdout to local stdout

Ah ok, yeah these SDKs never supported that. Not entirely sure way, I'll look into it and might follow up in a separate PR!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you understand why it's not?

I believe it's because it's killed by the server-side timeout? The JS SDK returns a different returncode in that case though (Python returns 137 I believe). Should we align that behavior?

Copy link

@saltzm saltzm Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the python stdout API is a little weird IMO, so wouldn't hate not porting it.

I believe it's because it's killed by the server-side timeout?

Ah right. Wonder if it's non-deterministic though. Wouldn't be surprised if this test flakes if the client side hits the deadline first

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re: Aligning the returncode, IMO it would make more sense for wait() to just throw an exception that indicates a timeout - I wish that's what the python client did. But we broadly need better APIs for sandboxes for determining difference between killed w a signal, returning a code, timing out, etc.

@ehdr ehdr merged commit dfef157 into main Dec 19, 2025
6 checks passed
@ehdr ehdr deleted the ehdr/sb-direct-exec branch December 19, 2025 13:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants