Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions mcp/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
.venv/
__pycache__/
*.pyc
*.egg-info/
dist/
build/
.olive-mcp/
.olive-cache/
8 changes: 8 additions & 0 deletions mcp/.mcp.json.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
{
"mcpServers": {
"olive": {
"command": "uv",
"args": ["run", "--directory", "/path/to/Olive/mcp", "python", "-m", "olive_mcp"]
}
}
}
131 changes: 131 additions & 0 deletions mcp/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
# Olive MCP Server

MCP server for Microsoft Olive model optimization. Provides tools for model optimization, quantization, fine-tuning, and benchmarking through the [Model Context Protocol](https://modelcontextprotocol.io/).

## Features

| Tool | Description |
|------|-------------|
| `optimize` | End-to-end optimization with automatic pass scheduling |
| `quantize` | Model quantization (RTN, GPTQ, AWQ, HQQ, and more) |
| `finetune` | LoRA / QLoRA fine-tuning |
| `capture_onnx_graph` | Capture ONNX graph via PyTorch Exporter or Model Builder |
| `benchmark` | Model evaluation using lm-eval tasks |
| `diffusion_lora` | Train LoRA adapters for diffusion models (SD 1.5, SDXL, Flux) |
| `recommend` | Preview optimization recommendations instantly without running anything |
| `explore_passes` | Browse available Olive passes and their parameter schemas |
| `run_config` | Validate or run custom Olive workflow configs |
| `detect_hardware` | Auto-detect CPU, RAM, GPU, and disk space for smart defaults |
| `manage_outputs` | List or delete previous optimization outputs |
| `get_job_status` | Check progress of a running job with structured phase detection |
| `cancel_job` | Cancel a running background job |

Each tool runs in an **isolated Python environment** (managed by uv) with the appropriate dependencies, so different onnxruntime variants (CPU, CUDA, DirectML, OpenVINO, etc.) never conflict.

> **Note:** The `olive-config` MCP server has been merged into this server. If you were using `olive-config` separately, you can remove it and use `explore_passes` / `run_config` from this server instead.

## Prerequisites

- Python 3.10+
- [uv](https://docs.astral.sh/uv/) (recommended) or pip

## Installation

```bash
git clone https://github.com/microsoft/Olive.git
cd Olive/mcp
uv sync
```

## Configuration

All MCP clients use the same server config — only the config file location differs.

**Server definition:**

```json
{
"command": "uv",
"args": ["run", "--directory", "/path/to/Olive/mcp", "python", "-m", "olive_mcp"]
}
```

> Replace `/path/to/Olive/mcp` with your actual project path.

| Client | Config file | Key |
|--------|------------|-----|
| **VS Code (Copilot)** | `.vscode/mcp.json` | `servers.olive` |
| **Claude Desktop** | `%APPDATA%\Claude\claude_desktop_config.json` (Win) / `~/Library/Application Support/Claude/claude_desktop_config.json` (Mac) | `mcpServers.olive` |
| **Claude Code** | `.mcp.json` in project root | `mcpServers.olive` |
| **Cursor** | `.cursor/mcp.json` | `mcpServers.olive` |
| **Windsurf** | `~/.codeium/windsurf/mcp_config.json` | `mcpServers.olive` |

<details>
<summary>VS Code example (.vscode/mcp.json)</summary>

```json
{
"servers": {
"olive": {
"type": "stdio",
"command": "uv",
"args": ["run", "--directory", "/path/to/Olive/mcp", "python", "-m", "olive_mcp"]
}
}
}
```
</details>

<details>
<summary>Claude Desktop / Claude Code / Cursor / Windsurf example</summary>

```json
{
"mcpServers": {
"olive": {
"command": "uv",
"args": ["run", "--directory", "/path/to/Olive/mcp", "python", "-m", "olive_mcp"]
}
}
}
```
</details>

## Usage with VS Code Copilot

1. Open **Copilot Chat** panel (`Ctrl+Alt+I`) and switch to **Agent** mode
2. Click the **Tools** icon to verify the Olive MCP tools are listed
3. Ask Copilot, for example: *"Optimize microsoft/Phi-3-mini-4k-instruct for CPU with int4"*
4. Copilot will ask for your confirmation before calling each MCP tool

## Example Prompts

```
Optimize microsoft/Phi-3-mini-4k-instruct

Quantize microsoft/Phi-3-mini-4k-instruct

Fine-tune microsoft/Phi-3-mini-4k-instruct on nampdn-ai/tiny-codes

Capture ONNX graph from microsoft/Phi-3-mini-4k-instruct

Benchmark microsoft/Phi-3-mini-4k-instruct

Train a LoRA for runwayml/stable-diffusion-v1-5 with dataset linoyts/Tuxemon

What's the best way to optimize Phi-4-mini for my hardware?

What passes are available for int4 quantization?

Help me write a custom Olive config with OnnxQuantization and GraphSurgeries
```

## Output

All optimization outputs are saved to `~/.olive-mcp/outputs/` with timestamped directories.

Completed jobs include:
- **Pass summary** — which passes ran and how long each took
- **File sizes** — output model size (and input model size when available) for before/after comparison
- **Structured progress** — `get_job_status` returns a `phase` field (e.g. "downloading", "quantizing", "saving") in addition to raw logs
- **Smart error suggestions** — if a job fails, actionable suggestions are attached (e.g. "Out of GPU memory, try int4" or "CPU does not support fp16")
18 changes: 18 additions & 0 deletions mcp/pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"

[project]
name = "olive-mcp"
version = "0.1.0"
description = "MCP server for Microsoft Olive model optimization"
requires-python = ">=3.10"
dependencies = [
"mcp[cli]"
]

[project.scripts]
olive-mcp = "olive_mcp:main"

[tool.hatch.build.targets.wheel]
packages = ["src/olive_mcp"]
10 changes: 10 additions & 0 deletions mcp/src/olive_mcp/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# -------------------------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.
# --------------------------------------------------------------------------
import olive_mcp.tools # noqa: F401 — registers @mcp.tool() and @mcp.prompt() on import
from olive_mcp.server import mcp


def main():
mcp.run()
7 changes: 7 additions & 0 deletions mcp/src/olive_mcp/__main__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# -------------------------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.
# --------------------------------------------------------------------------
from olive_mcp import main

main()
81 changes: 81 additions & 0 deletions mcp/src/olive_mcp/constants.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
# -------------------------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.
# --------------------------------------------------------------------------
from pathlib import Path

# ---------------------------------------------------------------------------
# Paths
# ---------------------------------------------------------------------------

VENV_BASE = Path.home() / ".olive-mcp" / "venvs"
OUTPUT_BASE = Path.home() / ".olive-mcp" / "outputs"
WORKER_PATH = Path(__file__).parent / "worker.py"

# Auto-purge venvs not used within this many days.
_VENV_MAX_AGE_DAYS = 14

# ---------------------------------------------------------------------------
# Command names
# ---------------------------------------------------------------------------

CMD_OPTIMIZE = "optimize"
CMD_QUANTIZE = "quantize"
CMD_FINETUNE = "finetune"
CMD_CAPTURE_ONNX_GRAPH = "capture_onnx_graph"
CMD_BENCHMARK = "benchmark"
CMD_DIFFUSION_LORA = "diffusion_lora"
CMD_EXPLORE_PASSES = "explore_passes"
CMD_VALIDATE_CONFIG = "validate_config"
CMD_RUN_CONFIG = "run_config"
Comment on lines +22 to +30
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Combine into a named StrEnum?


# ---------------------------------------------------------------------------
# Constants
# ---------------------------------------------------------------------------

SUPPORTED_PROVIDERS = [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use StrEnum?

"CPUExecutionProvider",
"CUDAExecutionProvider",
"DmlExecutionProvider",
"OpenVINOExecutionProvider",
"TensorrtExecutionProvider",
"ROCMExecutionProvider",
"QNNExecutionProvider",
"VitisAIExecutionProvider",
"WebGpuExecutionProvider",
"NvTensorRTRTXExecutionProvider",
]

SUPPORTED_PRECISIONS = [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use StrEnum?

"fp32",
"fp16",
"bf16",
"int4",
"int8",
"int16",
"int32",
"uint4",
"uint8",
"uint16",
"uint32",
]

SUPPORTED_QUANT_ALGORITHMS = ["rtn", "gptq", "awq", "hqq"]

# Maps provider → olive-ai extras key for onnxruntime variant
PROVIDER_TO_EXTRAS = {
"CPUExecutionProvider": "cpu",
"CUDAExecutionProvider": "gpu",
"TensorrtExecutionProvider": "gpu",
"ROCMExecutionProvider": "gpu",
"OpenVINOExecutionProvider": "openvino",
"DmlExecutionProvider": "directml",
"QNNExecutionProvider": "qnn",
}

# Maps provider → onnxruntime-genai variant (for ModelBuilder pass)
PROVIDER_TO_GENAI = {
"CPUExecutionProvider": "onnxruntime-genai",
"CUDAExecutionProvider": "onnxruntime-genai-cuda",
"DmlExecutionProvider": "onnxruntime-genai-directml",
}
Loading
Loading