Skip to content

[Roadmap] Q1 2026 #3452

@peterschmidt85

Description

@peterschmidt85

Note

This issue is not final and requires feedback, better requirements, and prioritization

Legend:

  • TBA — needs better requirements
  • WIP — in progress

Higher priority

Core

  • In-place update for backend fleets
  • In-place update for SSH fleets
  • Sharing fleets across projects with usage quotas (TBA)
  • GPU sharing (e.g., MIG)

Kubernetes

  • Improve K8s offers discovery (e.g., avoid strict node requirements, support GPU sharing if enabled)

Experimental

  • Support backend fleets that provision clusters/instances using clouds' managed K8S API (TBA)

Clusters

  • Better cluster support across backends (TBA)

Inference

  • Replica groups (WIP)
  • Prefill-decode disaggregation (WIP)
  • Prefill-decode auto-scaling (WIP)

Misc

  • (Reserved for small but important tasks) (TBA)

Less priority / backlog

Volumes

  • Better volumes support across backends (TBA)

Core

  • Pydantic v2 migration (TBA)

Security

  • Managed SSH proxies for AWS/GCP (TBA)

Inference

  • vLLM/Dynamo routers (TBA)
  • (Reserved for additional inference tasks)

TPU

  • Multi-host TPU support (TBA)

-#### Kubernetes

  • Consider moving kubeconfig property to the fleet configuration

Experimental

  • Topology-aware scheduling (TBA)
  • Multi-tenancy / host isolation (e.g., via reverse SSH proxying) (TBA)
  • Allow server/CLI to run outside private subnets by enabling compute plane (shim & runner) to connect to the control plane (TBA)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions