Support first-class GPU allocation for ROCK sandboxes

## Title

Support first-class GPU allocation for ROCK sandboxes

## Problem

ROCK can start sandbox containers, but GPU access is not currently a first-class concept across the full stack. In practice, users who need GPU-bound execution inside sandboxes have to patch the server-side Docker launch path manually.

This is limiting for:

- agentic evaluation where tools inside the sandbox need CUDA
- GPU-accelerated code execution or tests inside sandboxed repos
- mixed CPU/GPU sandbox fleets
- deterministic per-sandbox GPU assignment in multi-sandbox runs

## Requested Capability

Add end-to-end GPU support for sandboxes across:

- SDK request model
- admin API
- runtime deployment layer
- scheduler / placement layer
- operator-specific backends

## Proposed API Shape

Examples of the sort of fields that would be useful:

- `enable_gpu_passthrough: bool`
- `gpu_count: int | None`
- `gpu_device_request: str | None`
- `gpu_allocation_mode: Literal["fixed", "round_robin"]`

These should ideally be available:

- in SDK `SandboxConfig`
- in admin `SandboxStartRequest`
- in deployment config objects

## Expected Behavior

- request `all` GPUs or a specific count
- optionally request explicit device ids
- respect pre-existing `docker_args` / operator overrides
- support deterministic multi-sandbox allocation
- fail clearly when host GPU runtime is unavailable
- expose the effective GPU assignment in sandbox status / logs

## Backend Considerations

### Docker

- map request to `docker run --gpus ...`
- set appropriate visibility env vars when assignment is specific

### Ray

- reserve GPU-capable placement resources, not just CPU/memory

### Kubernetes

- map requests into pod resource requests/limits or template selection

## Why This Matters

Without first-class support, local patches can make sandbox GPU behavior work in one deployment but not in a portable or upstreamable way. A supported API would let ROLL and other ROCK users request GPU-capable sandboxes predictably and safely.

## Current Workaround

A server-side workaround can be implemented by extending ROCK runtime config and Docker launch logic, but that still leaves the SDK/API and scheduler layers unaware of GPU requirements. That is useful as an interim step, but not a complete solution.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support first-class GPU allocation for ROCK sandboxes #657

Title

Problem

Requested Capability

Proposed API Shape

Expected Behavior

Backend Considerations

Docker

Ray

Kubernetes

Why This Matters

Current Workaround

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support first-class GPU allocation for ROCK sandboxes #657

Description

Title

Problem

Requested Capability

Proposed API Shape

Expected Behavior

Backend Considerations

Docker

Ray

Kubernetes

Why This Matters

Current Workaround

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions