-
Notifications
You must be signed in to change notification settings - Fork 283
Add olive mcp server #2353
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add olive mcp server #2353
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,8 @@ | ||
| .venv/ | ||
| __pycache__/ | ||
| *.pyc | ||
| *.egg-info/ | ||
| dist/ | ||
| build/ | ||
| .olive-mcp/ | ||
| .olive-cache/ |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,8 @@ | ||
| { | ||
| "mcpServers": { | ||
| "olive": { | ||
| "command": "uv", | ||
| "args": ["run", "--directory", "/path/to/Olive/mcp", "python", "-m", "olive_mcp"] | ||
| } | ||
| } | ||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,131 @@ | ||
| # Olive MCP Server | ||
|
|
||
| MCP server for Microsoft Olive model optimization. Provides tools for model optimization, quantization, fine-tuning, and benchmarking through the [Model Context Protocol](https://modelcontextprotocol.io/). | ||
|
|
||
| ## Features | ||
|
|
||
| | Tool | Description | | ||
| |------|-------------| | ||
| | `optimize` | End-to-end optimization with automatic pass scheduling | | ||
| | `quantize` | Model quantization (RTN, GPTQ, AWQ, HQQ, and more) | | ||
| | `finetune` | LoRA / QLoRA fine-tuning | | ||
| | `capture_onnx_graph` | Capture ONNX graph via PyTorch Exporter or Model Builder | | ||
| | `benchmark` | Model evaluation using lm-eval tasks | | ||
| | `diffusion_lora` | Train LoRA adapters for diffusion models (SD 1.5, SDXL, Flux) | | ||
| | `recommend` | Preview optimization recommendations instantly without running anything | | ||
| | `explore_passes` | Browse available Olive passes and their parameter schemas | | ||
| | `run_config` | Validate or run custom Olive workflow configs | | ||
| | `detect_hardware` | Auto-detect CPU, RAM, GPU, and disk space for smart defaults | | ||
| | `manage_outputs` | List or delete previous optimization outputs | | ||
| | `get_job_status` | Check progress of a running job with structured phase detection | | ||
| | `cancel_job` | Cancel a running background job | | ||
|
|
||
| Each tool runs in an **isolated Python environment** (managed by uv) with the appropriate dependencies, so different onnxruntime variants (CPU, CUDA, DirectML, OpenVINO, etc.) never conflict. | ||
|
|
||
| > **Note:** The `olive-config` MCP server has been merged into this server. If you were using `olive-config` separately, you can remove it and use `explore_passes` / `run_config` from this server instead. | ||
|
|
||
| ## Prerequisites | ||
|
|
||
| - Python 3.10+ | ||
| - [uv](https://docs.astral.sh/uv/) (recommended) or pip | ||
|
|
||
| ## Installation | ||
|
|
||
| ```bash | ||
| git clone https://github.com/microsoft/Olive.git | ||
| cd Olive/mcp | ||
| uv sync | ||
| ``` | ||
|
|
||
| ## Configuration | ||
|
|
||
| All MCP clients use the same server config — only the config file location differs. | ||
|
|
||
| **Server definition:** | ||
|
|
||
| ```json | ||
| { | ||
| "command": "uv", | ||
| "args": ["run", "--directory", "/path/to/Olive/mcp", "python", "-m", "olive_mcp"] | ||
| } | ||
| ``` | ||
|
|
||
| > Replace `/path/to/Olive/mcp` with your actual project path. | ||
|
|
||
| | Client | Config file | Key | | ||
| |--------|------------|-----| | ||
| | **VS Code (Copilot)** | `.vscode/mcp.json` | `servers.olive` | | ||
| | **Claude Desktop** | `%APPDATA%\Claude\claude_desktop_config.json` (Win) / `~/Library/Application Support/Claude/claude_desktop_config.json` (Mac) | `mcpServers.olive` | | ||
| | **Claude Code** | `.mcp.json` in project root | `mcpServers.olive` | | ||
| | **Cursor** | `.cursor/mcp.json` | `mcpServers.olive` | | ||
| | **Windsurf** | `~/.codeium/windsurf/mcp_config.json` | `mcpServers.olive` | | ||
|
|
||
| <details> | ||
| <summary>VS Code example (.vscode/mcp.json)</summary> | ||
|
|
||
| ```json | ||
| { | ||
| "servers": { | ||
| "olive": { | ||
| "type": "stdio", | ||
| "command": "uv", | ||
| "args": ["run", "--directory", "/path/to/Olive/mcp", "python", "-m", "olive_mcp"] | ||
| } | ||
| } | ||
| } | ||
| ``` | ||
| </details> | ||
|
|
||
| <details> | ||
| <summary>Claude Desktop / Claude Code / Cursor / Windsurf example</summary> | ||
|
|
||
| ```json | ||
| { | ||
| "mcpServers": { | ||
| "olive": { | ||
| "command": "uv", | ||
| "args": ["run", "--directory", "/path/to/Olive/mcp", "python", "-m", "olive_mcp"] | ||
| } | ||
| } | ||
| } | ||
| ``` | ||
| </details> | ||
|
|
||
| ## Usage with VS Code Copilot | ||
|
|
||
| 1. Open **Copilot Chat** panel (`Ctrl+Alt+I`) and switch to **Agent** mode | ||
| 2. Click the **Tools** icon to verify the Olive MCP tools are listed | ||
| 3. Ask Copilot, for example: *"Optimize microsoft/Phi-3-mini-4k-instruct for CPU with int4"* | ||
| 4. Copilot will ask for your confirmation before calling each MCP tool | ||
|
|
||
| ## Example Prompts | ||
|
|
||
| ``` | ||
| Optimize microsoft/Phi-3-mini-4k-instruct | ||
|
|
||
| Quantize microsoft/Phi-3-mini-4k-instruct | ||
|
|
||
| Fine-tune microsoft/Phi-3-mini-4k-instruct on nampdn-ai/tiny-codes | ||
|
|
||
| Capture ONNX graph from microsoft/Phi-3-mini-4k-instruct | ||
|
|
||
| Benchmark microsoft/Phi-3-mini-4k-instruct | ||
|
|
||
| Train a LoRA for runwayml/stable-diffusion-v1-5 with dataset linoyts/Tuxemon | ||
|
|
||
| What's the best way to optimize Phi-4-mini for my hardware? | ||
|
|
||
| What passes are available for int4 quantization? | ||
|
|
||
| Help me write a custom Olive config with OnnxQuantization and GraphSurgeries | ||
| ``` | ||
|
|
||
| ## Output | ||
|
|
||
| All optimization outputs are saved to `~/.olive-mcp/outputs/` with timestamped directories. | ||
|
|
||
| Completed jobs include: | ||
| - **Pass summary** — which passes ran and how long each took | ||
| - **File sizes** — output model size (and input model size when available) for before/after comparison | ||
| - **Structured progress** — `get_job_status` returns a `phase` field (e.g. "downloading", "quantizing", "saving") in addition to raw logs | ||
| - **Smart error suggestions** — if a job fails, actionable suggestions are attached (e.g. "Out of GPU memory, try int4" or "CPU does not support fp16") |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,18 @@ | ||
| [build-system] | ||
| requires = ["hatchling"] | ||
| build-backend = "hatchling.build" | ||
|
|
||
| [project] | ||
| name = "olive-mcp" | ||
| version = "0.1.0" | ||
| description = "MCP server for Microsoft Olive model optimization" | ||
| requires-python = ">=3.10" | ||
| dependencies = [ | ||
| "mcp[cli]" | ||
| ] | ||
|
|
||
| [project.scripts] | ||
| olive-mcp = "olive_mcp:main" | ||
|
|
||
| [tool.hatch.build.targets.wheel] | ||
| packages = ["src/olive_mcp"] |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| # ------------------------------------------------------------------------- | ||
| # Copyright (c) Microsoft Corporation. All rights reserved. | ||
| # Licensed under the MIT License. | ||
| # -------------------------------------------------------------------------- | ||
| import olive_mcp.tools # noqa: F401 — registers @mcp.tool() and @mcp.prompt() on import | ||
| from olive_mcp.server import mcp | ||
|
|
||
|
|
||
| def main(): | ||
| mcp.run() |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,7 @@ | ||
| # ------------------------------------------------------------------------- | ||
| # Copyright (c) Microsoft Corporation. All rights reserved. | ||
| # Licensed under the MIT License. | ||
| # -------------------------------------------------------------------------- | ||
| from olive_mcp import main | ||
|
|
||
| main() |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,81 @@ | ||
| # ------------------------------------------------------------------------- | ||
| # Copyright (c) Microsoft Corporation. All rights reserved. | ||
| # Licensed under the MIT License. | ||
| # -------------------------------------------------------------------------- | ||
| from pathlib import Path | ||
|
|
||
| # --------------------------------------------------------------------------- | ||
| # Paths | ||
| # --------------------------------------------------------------------------- | ||
|
|
||
| VENV_BASE = Path.home() / ".olive-mcp" / "venvs" | ||
| OUTPUT_BASE = Path.home() / ".olive-mcp" / "outputs" | ||
| WORKER_PATH = Path(__file__).parent / "worker.py" | ||
|
|
||
| # Auto-purge venvs not used within this many days. | ||
| _VENV_MAX_AGE_DAYS = 14 | ||
|
|
||
| # --------------------------------------------------------------------------- | ||
| # Command names | ||
| # --------------------------------------------------------------------------- | ||
|
|
||
| CMD_OPTIMIZE = "optimize" | ||
| CMD_QUANTIZE = "quantize" | ||
| CMD_FINETUNE = "finetune" | ||
| CMD_CAPTURE_ONNX_GRAPH = "capture_onnx_graph" | ||
| CMD_BENCHMARK = "benchmark" | ||
| CMD_DIFFUSION_LORA = "diffusion_lora" | ||
| CMD_EXPLORE_PASSES = "explore_passes" | ||
| CMD_VALIDATE_CONFIG = "validate_config" | ||
| CMD_RUN_CONFIG = "run_config" | ||
|
Comment on lines
+22
to
+30
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Combine into a named StrEnum? |
||
|
|
||
| # --------------------------------------------------------------------------- | ||
| # Constants | ||
| # --------------------------------------------------------------------------- | ||
|
|
||
| SUPPORTED_PROVIDERS = [ | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Use StrEnum? |
||
| "CPUExecutionProvider", | ||
| "CUDAExecutionProvider", | ||
| "DmlExecutionProvider", | ||
| "OpenVINOExecutionProvider", | ||
| "TensorrtExecutionProvider", | ||
| "ROCMExecutionProvider", | ||
| "QNNExecutionProvider", | ||
| "VitisAIExecutionProvider", | ||
| "WebGpuExecutionProvider", | ||
| "NvTensorRTRTXExecutionProvider", | ||
| ] | ||
|
|
||
| SUPPORTED_PRECISIONS = [ | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Use StrEnum? |
||
| "fp32", | ||
| "fp16", | ||
| "bf16", | ||
| "int4", | ||
| "int8", | ||
| "int16", | ||
| "int32", | ||
| "uint4", | ||
| "uint8", | ||
| "uint16", | ||
| "uint32", | ||
| ] | ||
|
|
||
| SUPPORTED_QUANT_ALGORITHMS = ["rtn", "gptq", "awq", "hqq"] | ||
|
|
||
| # Maps provider → olive-ai extras key for onnxruntime variant | ||
| PROVIDER_TO_EXTRAS = { | ||
| "CPUExecutionProvider": "cpu", | ||
| "CUDAExecutionProvider": "gpu", | ||
| "TensorrtExecutionProvider": "gpu", | ||
| "ROCMExecutionProvider": "gpu", | ||
| "OpenVINOExecutionProvider": "openvino", | ||
| "DmlExecutionProvider": "directml", | ||
| "QNNExecutionProvider": "qnn", | ||
| } | ||
|
|
||
| # Maps provider → onnxruntime-genai variant (for ModelBuilder pass) | ||
| PROVIDER_TO_GENAI = { | ||
| "CPUExecutionProvider": "onnxruntime-genai", | ||
| "CUDAExecutionProvider": "onnxruntime-genai-cuda", | ||
| "DmlExecutionProvider": "onnxruntime-genai-directml", | ||
| } | ||
Uh oh!
There was an error while loading. Please reload this page.