Implement sandbox exec through command router#238
Conversation
Port modal-labs/modal-client@6133e091 to add direct worker connections for Sandbox exec operations, reducing latency and improving reliability.
c947908 to
622cf1c
Compare
saltzm
left a comment
There was a problem hiding this comment.
Looks pretty good overall! Main thing is I think it'd be good to add some more tests
| stdoutConfig = TaskExecStdoutConfig.TASK_EXEC_STDOUT_CONFIG_DEVNULL; | ||
| } else { | ||
| throw new Error(`Unsupported stdout behavior: ${stdout}`); | ||
| } |
There was a problem hiding this comment.
Does the JS API not support "stdout" as an option?
There was a problem hiding this comment.
in SandboxExecParams you can pass stdout, and it can be one of StdioBehavior = "pipe" | "ignore", does that make sense?
There was a problem hiding this comment.
Ah - I meant stdout = "stdout" 😄 There's a third option in the python client: https://github.com/modal-labs/modal-client/blob/efe8b48bbd66ba204447de2f035614870dfb1a6a/modal/stream_type.py#L12
Not adamant we support it here, I don't think it's commonly used, though a couple customers do use it. It just prints everything from the exec stdout to local stdout
modal-js/src/sandbox.ts
Outdated
| command: string[], | ||
| params?: SandboxExecParams & { mode?: "text" }, | ||
| ): Promise<ContainerProcess<string>>; | ||
| ): Promise<ContainerProcess<string> | ContainerProcessThroughRouter<string>>; |
There was a problem hiding this comment.
Is this a breaking API change? If so does it matter? If we're okay doing this, we could maybe just remove the old ContainerProcess object, since I don't think we're going back to the old exec implementation at this point. Fine doing that in stages though if we want to play it safe
There was a problem hiding this comment.
Ah good, I'll remove the old implementation then!
modal-js/src/sandbox.ts
Outdated
| ), | ||
| ); | ||
| if (stdout === "ignore") { | ||
| stdoutStream.cancel(); |
There was a problem hiding this comment.
Could we just avoid constructing the above stdoutStream instead?
modal-js/src/sandbox.ts
Outdated
| ), | ||
| ); | ||
| if (stderr === "ignore") { | ||
| stderrStream.cancel(); |
modal-js/src/sandbox.ts
Outdated
| execId, | ||
| offset, | ||
| new Uint8Array(0), | ||
| true, |
| await expect( | ||
| callWithRetriesOnTransientErrors(func, 100, 2, null, deadline), | ||
| ).rejects.toThrow("Deadline exceeded"); | ||
| }); |
There was a problem hiding this comment.
Would be good to have testing parity with python client
|
|
||
| try { | ||
| for await (const item of stream) { | ||
| numAuthRetries = 0; |
There was a problem hiding this comment.
Just changed this to a bool in the python implementation: https://github.com/modal-labs/modal-client/pull/3828/changes
There was a problem hiding this comment.
Changed here too now to match!
saltzm
left a comment
There was a problem hiding this comment.
A few questions that weren't answered from the last review, but after that looks good to me
| this.#taskId, | ||
| this.#execId, | ||
| this.#deadline, | ||
| ); |
saltzm
left a comment
There was a problem hiding this comment.
One remaining question then looks good
| stdoutConfig = TaskExecStdoutConfig.TASK_EXEC_STDOUT_CONFIG_DEVNULL; | ||
| } else { | ||
| throw new Error(`Unsupported stdout behavior: ${stdout}`); | ||
| } |
There was a problem hiding this comment.
Ah - I meant stdout = "stdout" 😄 There's a third option in the python client: https://github.com/modal-labs/modal-client/blob/efe8b48bbd66ba204447de2f035614870dfb1a6a/modal/stream_type.py#L12
Not adamant we support it here, I don't think it's commonly used, though a couple customers do use it. It just prints everything from the exec stdout to local stdout
|
|
||
| const p = await sb.exec(["sleep", "999"], { timeoutMs: 1000 }); | ||
| const t0 = Date.now(); | ||
| const exitCode = await p.wait(); |
There was a problem hiding this comment.
I think I'd expect this to throw based on the implementation? https://github.com/modal-labs/libmodal/pull/238/changes#diff-c6c555742ea2730b4a75182c097f0e07c70d86ec2af2b6f2a97bcb790c9777b7R96
Do you understand why it's not?
There was a problem hiding this comment.
Ah - I meant stdout = "stdout" 😄 There's a third option in the python client: https://github.com/modal-labs/modal-client/blob/efe8b48bbd66ba204447de2f035614870dfb1a6a/modal/stream_type.py#L12
Not adamant we support it here, I don't think it's commonly used, though a couple customers do use it. It just prints everything from the exec stdout to local stdout
Ah ok, yeah these SDKs never supported that. Not entirely sure way, I'll look into it and might follow up in a separate PR!
There was a problem hiding this comment.
Do you understand why it's not?
I believe it's because it's killed by the server-side timeout? The JS SDK returns a different returncode in that case though (Python returns 137 I believe). Should we align that behavior?
There was a problem hiding this comment.
I think the python stdout API is a little weird IMO, so wouldn't hate not porting it.
I believe it's because it's killed by the server-side timeout?
Ah right. Wonder if it's non-deterministic though. Wouldn't be surprised if this test flakes if the client side hits the deadline first
There was a problem hiding this comment.
Re: Aligning the returncode, IMO it would make more sense for wait() to just throw an exception that indicates a timeout - I wish that's what the python client did. But we broadly need better APIs for sandboxes for determining difference between killed w a signal, returning a code, timing out, etc.
Port modal-labs/modal-client@6133e091 / modal-labs/modal-client#3673 to add direct worker connections for Sandbox
exec operations, reducing latency and improving reliability.
The first commit bumps the modal-client submodule and refreshes protos, sorry for the noisy commit. Probably best to exclude that commit from the diff when revieweing.
Note
Switches Sandbox exec to the task command router with direct worker connections, adds client/config support, updates tests, and refreshes protos with related fields/messages.
TaskCommandRouterClientImpl(direct worker connections), replacing container-exec path.validateExecArgs, deadline handling, PTY mapping, and binary/text I/O streaming via router.buildTaskExecStartRequestProto; updateContainerProcessI/O to use router.task_command_router_insecure(env:MODAL_TASK_COMMAND_ROUTER_INSECURE).timeoutMiddleware; minorCloudBucketMount.toProtocreate() usage.HTTPConfigand fields toFunction/FunctionData.CloudBucketMount(force_path_style, metadata TTL oneof/enum).AppCreateRequest.tags,AppGetLogsRequest.parametrized_function_id,CancelInputEvent.cancellation_reason.region; addRPCRetryPolicy/RPCStatus.Written by Cursor Bugbot for commit 4580ac9. This will update automatically on new commits. Configure here.