-
Notifications
You must be signed in to change notification settings - Fork 207
Open
Description
Note
This issue is not final and requires feedback, better requirements, and prioritization
Legend:
- TBA — needs better requirements
- WIP — in progress
Higher priority
Core
- In-place update for backend fleets
- In-place update for SSH fleets
- Sharing fleets across projects with usage quotas (TBA)
- GPU sharing (e.g., MIG)
Kubernetes
- Improve K8s offers discovery (e.g., avoid strict node requirements, support GPU sharing if enabled)
Experimental
- Support backend fleets that provision clusters/instances using clouds' managed K8S API (TBA)
Clusters
- Better cluster support across backends (TBA)
Inference
- Replica groups (WIP)
- Prefill-decode disaggregation (WIP)
- Prefill-decode auto-scaling (WIP)
Misc
- (Reserved for small but important tasks) (TBA)
Less priority / backlog
Volumes
- Better volumes support across backends (TBA)
Core
- Pydantic v2 migration (TBA)
Security
- Managed SSH proxies for AWS/GCP (TBA)
Inference
- vLLM/Dynamo routers (TBA)
- (Reserved for additional inference tasks)
TPU
- Multi-host TPU support (TBA)
-#### Kubernetes
- Consider moving kubeconfig property to the fleet configuration
Experimental
- Topology-aware scheduling (TBA)
- Multi-tenancy / host isolation (e.g., via reverse SSH proxying) (TBA)
- Allow server/CLI to run outside private subnets by enabling compute plane (shim & runner) to connect to the control plane (TBA)
Metadata
Metadata
Assignees
Labels
No labels