Implement dmrlet orchestrator for Docker Model Runner by Copilot · Pull Request #83 · ericcurtin/model-runner

Copilot · 2026-02-03T21:10:19Z

Adds dmrlet, a Kubernetes-like orchestrator for Docker Model Runner written in Rust. Enables direct client-to-inference-server communication, bypassing central daemon for data-plane operations.

Architecture

Control Plane: Daemon manages deployments, scheduling, health monitoring
Data Plane: Workers run inference servers (llama.cpp, vLLM, MLX) on assigned ports
Direct Access: Clients can connect directly to workers or go through load balancer

Crates

dmrlet-core — Types: DeploymentSpec, Worker, Endpoint, Config, GPU detection
dmrlet-scheduler — GPU-aware placement, resource allocation, port management
dmrlet-runtime — Process-based worker lifecycle (macOS/Windows; containerd placeholder for Linux)
dmrlet-network — Health checking, round-robin load balancing, service discovery
dmrlet-store — Model cache with LRU eviction, OCI store placeholder
dmrlet-api — REST API via Axum
dmrlet-daemon — dmrletd binary
dmrlet-cli — dmrlet CLI

CLI

dmrlet deploy ai/llama3:8b --replicas=2 --gpu=1
dmrlet scale llama3 4
dmrlet status
dmrlet endpoints  # Get direct worker URLs
dmrlet gpus

REST API

POST   /api/v1/deployments              # Create
GET    /api/v1/deployments              # List
DELETE /api/v1/deployments/{id}         # Delete
POST   /api/v1/deployments/{id}/scale   # Scale
GET    /api/v1/endpoints                # Direct access URLs
GET    /api/v1/gpus                     # GPU info

Tests

35 unit tests covering scheduling, networking, storage, and core types.

Original prompt

dmrlet Implementation Plan

Overview

dmrlet is a Kubernetes-like orchestrator for Docker Model Runner, written in Rust. It enables clients to
communicate directly with inference servers, bypassing the central daemon for data-plane operations.

Architecture

CONTROL PLANE (dmrlet daemon)
+------------------------------------------------------------------+
|  +-------------------+  +------------------+  +----------------+  |
|  |   API Server      |  |    Scheduler     |  |  Model Store   |  |
|  |  (REST + gRPC)    |  |  (GPU-aware)     |  |  (OCI-based)   |  |
|  +-------------------+  +------------------+  +----------------+  |
|           |                     |                    |            |
|  +-------------------+  +------------------+  +----------------+  |
|  |  Worker Manager   |  |  Health Monitor  |  |  Load Balancer |  |
|  |  (containerd/proc)|  |  (auto-restart)  |  |  (L7 proxy)    |  |
|  +-------------------+  +------------------+  +----------------+  |
+------------------------------------------------------------------+
|                      |                     |
v                      v                     v
+------------------------------------------------------------------+
|                          DATA PLANE                               |
|  +-------------+   +-------------+   +-------------+              |
|  |  Worker 0   |   |  Worker 1   |   |  Worker N   |              |
|  | llama.cpp   |   |   vLLM      |   |    MLX      |              |
|  | :30000      |   |   :30001    |   |   :30002    |              |
|  | GPU 0       |   |   GPU 1     |   |   GPU 2     |              |
|  +-------------+   +-------------+   +-------------+              |
|        ^                 ^                 ^                      |
+--------|-----------------|-----------------|----------------------+
|                 |                 |
+--------+--------+---------+-------+
|                  |
Direct Connection    Load Balanced
v                  v
CLIENTS           CLIENTS

Key Advantages Over Kubernetes

Simple Configuration: TOML/CLI flags vs complex YAML manifests
Native GPU Scheduling: Built-in GPU detection and allocation (no device plugins)
Lower Overhead: Single daemon vs etcd + API server + controller-manager
Automatic Model Management: OCI-based model pulling integrated
Direct Worker Access: Clients can bypass load balancer for lowest latency

Project Structure

dmrlet/
├── Cargo.toml                    # Workspace root
├── crates/
│   ├── dmrlet-core/             # Core types and traits
│   │   └── src/
│   │       ├── lib.rs
│   │       ├── model.rs         # Model, Worker, Endpoint types
│   │       ├── config.rs        # Configuration types
│   │       ├── error.rs         # Error types
│   │       └── gpu.rs           # GPU detection and allocation
│   │
│   ├── dmrlet-runtime/          # Runtime abstraction layer
│   │   └── src/
│   │       ├── lib.rs
│   │       ├── traits.rs        # Runtime trait definitions
│   │       ├── containerd.rs    # Linux containerd implementation
│   │       ├── process.rs       # macOS/Windows process implementation
│   │       └── sandbox.rs       # Platform-specific sandboxing
│   │
│   ├── dmrlet-scheduler/        # GPU-aware scheduler
│   │   └── src/
│   │       ├── lib.rs
│   │       ├── scheduler.rs     # Main scheduler logic
│   │       ├── gpu_allocator.rs # GPU device assignment
│   │       └── placement.rs     # Worker placement decisions
│   │
│   ├── dmrlet-network/          # Networking and load balancing
│   │   └── src/
│   │       ├── lib.rs
│   │       ├── discovery.rs     # Service discovery
│   │       ├── proxy.rs         # L7 reverse proxy
│   │       ├── balancer.rs      # Load balancing strategies
│   │       └── health.rs        # Health checking
│   │
│   ├── dmrlet-store/            # Model storage
│   │   └── src/
│   │       ├── lib.rs
│   │       ├── oci.rs           # OCI image handling
│   │       ├── pull.rs          # Model pulling
│   │       └── cache.rs         # Local caching
│   │
│   ├── dmrlet-api/              # API server
│   │   └── src/
│   │       ├── lib.rs
│   │       ├── rest.rs          # REST API handlers
│   │       └── grpc.rs          # gRPC service
│   │
│   ├── dmrlet-daemon/           # Main daemon binary
│   │   └── src/main.rs
│   │
│   └── dmrlet-cli/              # CLI binary
│       └── src/
│           ├── main.rs
│           └── commands/
│               ├── mod.rs
│               ├── deploy.rs
│               ├── scale.rs
│               ├── status.rs
│               └── logs.rs
│
└── proto/
└── dmrlet.proto             # gRPC definitions

CLI Commands

# Deployment
dmrlet deploy ...

</details>



<!-- START COPILOT CODING AGENT TIPS -->
---

✨ Let Copilot coding agent [set things up for you](https://github.com/ericcurtin/model-runner/issues/new?title=✨+Set+up+Copilot+instructions&body=Configure%20instructions%20for%20this%20repository%20as%20documented%20in%20%5BBest%20practices%20for%20Copilot%20coding%20agent%20in%20your%20repository%5D%28https://gh.io/copilot-coding-agent-tips%29%2E%0A%0A%3COnboard%20this%20repo%3E&assignees=copilot) — coding agent works faster and does higher quality work when set up for your repo.

- Add Cargo workspace with 8 crates - dmrlet-core: Core types (Model, Worker, Endpoint, Config, Error, GPU) - dmrlet-runtime: Process-based runtime for macOS/Windows - dmrlet-scheduler: GPU-aware scheduler with placement logic - dmrlet-network: Health checking, load balancing, service discovery - dmrlet-store: Model cache with LRU eviction - dmrlet-api: REST API server with Axum - dmrlet-daemon: Main daemon binary (dmrletd) - dmrlet-cli: CLI tool (dmrlet) with deploy, scale, delete, status commands - Add proto file for future gRPC support - Update .gitignore for Rust build artifacts Co-authored-by: ericcurtin <1694275+ericcurtin@users.noreply.github.com>

- Fix OCI model path to use hash-based filenames for uniqueness - Improve HTTP client error message with actionable guidance - Update tests for new model path format Co-authored-by: ericcurtin <1694275+ericcurtin@users.noreply.github.com>

Initial plan

d192636

Copilot AI assigned Copilot and ericcurtin Feb 3, 2026

Copilot started work on behalf of ericcurtin February 3, 2026 21:10 View session

Copilot AI and others added 2 commits February 3, 2026 21:23

Address code review feedback

5bef25b

- Fix OCI model path to use hash-based filenames for uniqueness - Improve HTTP client error message with actionable guidance - Update tests for new model path format Co-authored-by: ericcurtin <1694275+ericcurtin@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Implement dmrlet orchestrator for Docker Model Runner~~ Implement dmrlet orchestrator for Docker Model Runner Feb 3, 2026

Copilot AI requested a review from ericcurtin February 3, 2026 21:30

Copilot finished work on behalf of ericcurtin February 3, 2026 21:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Implement dmrlet orchestrator for Docker Model Runner#83

Implement dmrlet orchestrator for Docker Model Runner#83
Copilot wants to merge 3 commits intomainfrom
copilot/implement-dmrlet-orchestrator

Copilot AI commented Feb 3, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

Copilot AI commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Architecture

Crates

CLI

REST API

Tests

dmrlet Implementation Plan

Overview

Architecture

Key Advantages Over Kubernetes

Project Structure

CLI Commands

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Feb 3, 2026 •

edited

Loading