Skip to content

feat: add Go sandbox client SDK#424

Open
wllbo wants to merge 5 commits intokubernetes-sigs:mainfrom
wllbo:go-client-sdk
Open

feat: add Go sandbox client SDK#424
wllbo wants to merge 5 commits intokubernetes-sigs:mainfrom
wllbo:go-client-sdk

Conversation

@wllbo
Copy link

@wllbo wllbo commented Mar 17, 2026

Closes #227

Summary

Go client for agent-sandbox at clients/go/sandbox. Builds on the clientsets from #233, adding the user facing layer on top.

API lands close to what was proposed in #227, handles claim lifecycle, connectivity, and cleanup so callers don't have to work with the K8s API directly. Covers the full core SandboxClient surface from the Python SDK: run, write, read, list, exists, lifecycle management, and opt-in OTel tracing. Doesn't include the extensions yet (ComputerUseSandbox, podsnapshot) but I think those can also come as follow-ups.

client, err := sandbox.NewClient(sandbox.Options{
    TemplateName: "my-template",
    Namespace:    "default",
})
if err != nil { log.Fatal(err) }
defer client.Close(context.Background())

if err := client.Open(ctx); err != nil { log.Fatal(err) }

result, err := client.Run(ctx, "echo hello")

Three connectivity modes depending on what you set in Options:

  • Gateway: watches Gateway API for an external IP, routes through sandbox-router. Set GatewayName.
  • Port-forward: native SPDY tunnel via client-go (no kubectl binary, unlike the Python client which shells out to kubectl port-forward). This is the default when neither GatewayName nor APIURL is set.
  • Direct URL: set APIURL for in-cluster agents or custom domains, skips all discovery.

Notes

Exported a Client interface separate from the concrete SandboxClient so consumers can mock it in tests. Identity accessors (ClaimName, SandboxName, PodName, Annotations) are on a separate SandboxInfo interface so adding new accessors later isn't a breaking change for anyone who's written a mock.

HTTP operations go through a retrying transport (exponential backoff + jitter, 6 attempts, per-attempt timeout). Only retries 5xx and connection errors (4xx are not retried since they indicate a client problem).

Port-forward mode uses native SPDY via client-go instead of shelling out to kubectl. A background goroutine monitors the tunnel and clears client state immediately on death, so subsequent ops fail with ErrNotReady instead of blocking until timeout.

If Close() can't reach the API server to delete the claim, the client hangs onto the claim name so Close() can be retried. Calling Open() on a client with a dangling claim returns ErrOrphanedClaim so callers don't silently leak resources.

Testing

  • Unit tests: lifecycle, all ops, retry logic, path validation, port-forward death/recovery, tracing spans
  • Integration tests against a live cluster (-tags=integration), all three connectivity modes
  • go vet / staticcheck clean

@netlify
Copy link

netlify bot commented Mar 17, 2026

Deploy Preview for agent-sandbox ready!

Name Link
🔨 Latest commit 1c2cd98
🔍 Latest deploy log https://app.netlify.com/projects/agent-sandbox/deploys/69be1100c0192700085231ad
😎 Deploy Preview https://deploy-preview-424--agent-sandbox.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@k8s-ci-robot
Copy link
Contributor

Welcome @wllbo!

It looks like this is your first PR to kubernetes-sigs/agent-sandbox 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/agent-sandbox has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Mar 17, 2026
@k8s-ci-robot
Copy link
Contributor

Hi @wllbo. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Mar 17, 2026
"sigs.k8s.io/agent-sandbox/clients/go/sandbox"
)

func main() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this PR. We are migrating the existing Python client to manage multiple Sandboxes: https://github.com/kubernetes-sigs/agent-sandbox/pull/361/changes.

The work is in progress: #382

I highly encourage to simulate a similar pattern for Go client. There are lot of re-usable components you have already defined which is great.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to align

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I looked through #361 and #382, makes sense. I'll rework the PR to match the handle pattern and then keep an eye on the follow-up parts to #382 as they come in

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wllbo - @SHRUTI6991's change is in. You can make the changes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the heads up @aditya-shantanu! I went ahead and rebased on main after @SHRUTI6991's change merged and pushed the refactor (eac94eb).

Now the Go SDK follows the same handle pattern: Sandbox handle, shareable K8sHelper, ConnectionStrategy interface, and separate Commands/Files engines. I also added Disconnect(ctx) and Handle/Info interfaces for mocking.

I'll add the remaining parts of KEP #361 (SandboxClient factory, re-attach, ProcessSystem, lifecycle states) once they're in the Python SDK so I can match the design.

@janetkuo janetkuo added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Mar 17, 2026
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: wllbo
Once this PR has been reviewed and has the lgtm label, please assign barney-s for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

return nil, fmt.Errorf("%s: run failed: %w", c.errPrefix(), err)
}
defer resp.Body.Close()
defer func() { _, _ = io.Copy(io.Discard, resp.Body) }()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defer func() { _, _ = io.Copy(io.Discard, resp.Body) }() is used in multiple files. This defer executes before the body is closed and will synchronously download the entire remainder of any response payload. If a user attempts to read a file that exceeds MaxDownloadSize, the SDK will correctly catch the limit, but the defer will then stall the goroutine while it silently downloads the rest of the file into /dev/null.

Consider either removing (for error paths) or bounding it (for success paths) with io.LimitReader. It's cheaper to let the transport close the TCP connection than to drain a large, unwanted payload.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I've now capped drains at 4KB and skip them on error paths

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement Agentic Sandbox Client in GO

5 participants