Skip to content

Comments

Implement dmrlet orchestrator for Docker Model Runner#83

Draft
Copilot wants to merge 3 commits intomainfrom
copilot/implement-dmrlet-orchestrator
Draft

Implement dmrlet orchestrator for Docker Model Runner#83
Copilot wants to merge 3 commits intomainfrom
copilot/implement-dmrlet-orchestrator

Conversation

Copy link

Copilot AI commented Feb 3, 2026

Adds dmrlet, a Kubernetes-like orchestrator for Docker Model Runner written in Rust. Enables direct client-to-inference-server communication, bypassing central daemon for data-plane operations.

Architecture

  • Control Plane: Daemon manages deployments, scheduling, health monitoring
  • Data Plane: Workers run inference servers (llama.cpp, vLLM, MLX) on assigned ports
  • Direct Access: Clients can connect directly to workers or go through load balancer

Crates

  • dmrlet-core — Types: DeploymentSpec, Worker, Endpoint, Config, GPU detection
  • dmrlet-scheduler — GPU-aware placement, resource allocation, port management
  • dmrlet-runtime — Process-based worker lifecycle (macOS/Windows; containerd placeholder for Linux)
  • dmrlet-network — Health checking, round-robin load balancing, service discovery
  • dmrlet-store — Model cache with LRU eviction, OCI store placeholder
  • dmrlet-api — REST API via Axum
  • dmrlet-daemondmrletd binary
  • dmrlet-clidmrlet CLI

CLI

dmrlet deploy ai/llama3:8b --replicas=2 --gpu=1
dmrlet scale llama3 4
dmrlet status
dmrlet endpoints  # Get direct worker URLs
dmrlet gpus

REST API

POST   /api/v1/deployments              # Create
GET    /api/v1/deployments              # List
DELETE /api/v1/deployments/{id}         # Delete
POST   /api/v1/deployments/{id}/scale   # Scale
GET    /api/v1/endpoints                # Direct access URLs
GET    /api/v1/gpus                     # GPU info

Tests

35 unit tests covering scheduling, networking, storage, and core types.

Original prompt

dmrlet Implementation Plan

Overview

dmrlet is a Kubernetes-like orchestrator for Docker Model Runner, written in Rust. It enables clients to
communicate directly with inference servers, bypassing the central daemon for data-plane operations.

Architecture

CONTROL PLANE (dmrlet daemon)
+------------------------------------------------------------------+
|  +-------------------+  +------------------+  +----------------+  |
|  |   API Server      |  |    Scheduler     |  |  Model Store   |  |
|  |  (REST + gRPC)    |  |  (GPU-aware)     |  |  (OCI-based)   |  |
|  +-------------------+  +------------------+  +----------------+  |
|           |                     |                    |            |
|  +-------------------+  +------------------+  +----------------+  |
|  |  Worker Manager   |  |  Health Monitor  |  |  Load Balancer |  |
|  |  (containerd/proc)|  |  (auto-restart)  |  |  (L7 proxy)    |  |
|  +-------------------+  +------------------+  +----------------+  |
+------------------------------------------------------------------+
|                      |                     |
v                      v                     v
+------------------------------------------------------------------+
|                          DATA PLANE                               |
|  +-------------+   +-------------+   +-------------+              |
|  |  Worker 0   |   |  Worker 1   |   |  Worker N   |              |
|  | llama.cpp   |   |   vLLM      |   |    MLX      |              |
|  | :30000      |   |   :30001    |   |   :30002    |              |
|  | GPU 0       |   |   GPU 1     |   |   GPU 2     |              |
|  +-------------+   +-------------+   +-------------+              |
|        ^                 ^                 ^                      |
+--------|-----------------|-----------------|----------------------+
|                 |                 |
+--------+--------+---------+-------+
|                  |
Direct Connection    Load Balanced
v                  v
CLIENTS           CLIENTS

Key Advantages Over Kubernetes

  1. Simple Configuration: TOML/CLI flags vs complex YAML manifests
  2. Native GPU Scheduling: Built-in GPU detection and allocation (no device plugins)
  3. Lower Overhead: Single daemon vs etcd + API server + controller-manager
  4. Automatic Model Management: OCI-based model pulling integrated
  5. Direct Worker Access: Clients can bypass load balancer for lowest latency

Project Structure

dmrlet/
├── Cargo.toml                    # Workspace root
├── crates/
│   ├── dmrlet-core/             # Core types and traits
│   │   └── src/
│   │       ├── lib.rs
│   │       ├── model.rs         # Model, Worker, Endpoint types
│   │       ├── config.rs        # Configuration types
│   │       ├── error.rs         # Error types
│   │       └── gpu.rs           # GPU detection and allocation
│   │
│   ├── dmrlet-runtime/          # Runtime abstraction layer
│   │   └── src/
│   │       ├── lib.rs
│   │       ├── traits.rs        # Runtime trait definitions
│   │       ├── containerd.rs    # Linux containerd implementation
│   │       ├── process.rs       # macOS/Windows process implementation
│   │       └── sandbox.rs       # Platform-specific sandboxing
│   │
│   ├── dmrlet-scheduler/        # GPU-aware scheduler
│   │   └── src/
│   │       ├── lib.rs
│   │       ├── scheduler.rs     # Main scheduler logic
│   │       ├── gpu_allocator.rs # GPU device assignment
│   │       └── placement.rs     # Worker placement decisions
│   │
│   ├── dmrlet-network/          # Networking and load balancing
│   │   └── src/
│   │       ├── lib.rs
│   │       ├── discovery.rs     # Service discovery
│   │       ├── proxy.rs         # L7 reverse proxy
│   │       ├── balancer.rs      # Load balancing strategies
│   │       └── health.rs        # Health checking
│   │
│   ├── dmrlet-store/            # Model storage
│   │   └── src/
│   │       ├── lib.rs
│   │       ├── oci.rs           # OCI image handling
│   │       ├── pull.rs          # Model pulling
│   │       └── cache.rs         # Local caching
│   │
│   ├── dmrlet-api/              # API server
│   │   └── src/
│   │       ├── lib.rs
│   │       ├── rest.rs          # REST API handlers
│   │       └── grpc.rs          # gRPC service
│   │
│   ├── dmrlet-daemon/           # Main daemon binary
│   │   └── src/main.rs
│   │
│   └── dmrlet-cli/              # CLI binary
│       └── src/
│           ├── main.rs
│           └── commands/
│               ├── mod.rs
│               ├── deploy.rs
│               ├── scale.rs
│               ├── status.rs
│               └── logs.rs
│
└── proto/
└── dmrlet.proto             # gRPC definitions

CLI Commands

# Deployment
dmrlet deploy ...

</details>



<!-- START COPILOT CODING AGENT TIPS -->
---

✨ Let Copilot coding agent [set things up for you](https://github.com/ericcurtin/model-runner/issues/new?title=✨+Set+up+Copilot+instructions&body=Configure%20instructions%20for%20this%20repository%20as%20documented%20in%20%5BBest%20practices%20for%20Copilot%20coding%20agent%20in%20your%20repository%5D%28https://gh.io/copilot-coding-agent-tips%29%2E%0A%0A%3COnboard%20this%20repo%3E&assignees=copilot) — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI and others added 2 commits February 3, 2026 21:23
- Add Cargo workspace with 8 crates
- dmrlet-core: Core types (Model, Worker, Endpoint, Config, Error, GPU)
- dmrlet-runtime: Process-based runtime for macOS/Windows
- dmrlet-scheduler: GPU-aware scheduler with placement logic
- dmrlet-network: Health checking, load balancing, service discovery
- dmrlet-store: Model cache with LRU eviction
- dmrlet-api: REST API server with Axum
- dmrlet-daemon: Main daemon binary (dmrletd)
- dmrlet-cli: CLI tool (dmrlet) with deploy, scale, delete, status commands
- Add proto file for future gRPC support
- Update .gitignore for Rust build artifacts

Co-authored-by: ericcurtin <1694275+ericcurtin@users.noreply.github.com>
- Fix OCI model path to use hash-based filenames for uniqueness
- Improve HTTP client error message with actionable guidance
- Update tests for new model path format

Co-authored-by: ericcurtin <1694275+ericcurtin@users.noreply.github.com>
Copilot AI changed the title [WIP] Implement dmrlet orchestrator for Docker Model Runner Implement dmrlet orchestrator for Docker Model Runner Feb 3, 2026
Copilot AI requested a review from ericcurtin February 3, 2026 21:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants