Skip to content

stabgan/openrouter-mcp-multimodal

Repository files navigation

OpenRouter MCP Multimodal

OpenRouter MCP Multimodal Server

The all-in-one MCP server for 300+ LLMs — text, vision, audio, and video in a single package.

npm version Docker version CI MIT Node.js

npm downloads npm monthly Docker pulls GitHub stars GitHub forks

3,800+ installs across npm + Docker Hub · ~950 npm installs/month and accelerating

Install · Tools · Quick Start · Config · Examples · Architecture · Changelog


Verified on MseeP

Access 300+ LLMs through OpenRouter via the Model Context Protocol. Analyze images, audio, and video. Generate images, audio, and video. Chat with any model. Every tool returns structured _meta.code errors so MCP clients can switch on failure modes without parsing strings.

One-Click Install

KiroAdd to Kiro
CursorAdd to Cursor
VS CodeAdd to VS Code
VS Code InsidersAdd to VS Code Insiders
Claude DesktopInstall Guide — Add to claude_desktop_config.json
WindsurfInstall Guide — Add to ~/.codeium/windsurf/mcp_config.json
ClineInstall Guide — Add via Cline MCP settings
Smitherynpx -y @smithery/cli install @stabgan/openrouter-mcp-multimodal --client claude

After clicking, the target client opens a confirmation prompt. You'll need to paste your OPENROUTER_API_KEY — the deeplink ships a placeholder so no secrets end up in shared links.

Why This One?

Feature Status
Text chat with 300+ models
Image analysis (vision) ✅ Native with sharp optimization
Audio analysis ✅ Transcription + analysis, base64 auto-encoded
Audio generation ✅ Conversational, speech, and music with format auto-detection
Image generation ✅ Path-sandboxed disk output
Video understanding v3 — mp4, mpeg, mov, webm from files, URLs, or data URLs
Video generation v3 — Veo 3.1 / Sora 2 Pro / Seedance / Wan via async API with progress notifications
Auto image resize + compress ✅ Configurable (defaults 800px max, JPEG 80%)
Model search + validation ✅ Filter by vision / audio / video modality
Free model support ✅ Default: free Nemotron VL
Docker support ✅ Multi-arch (amd64 + arm64), ~345 MB Alpine
Retry-After + jitter ✅ Honors Retry-After header, avoids thundering herd
IPv4 + IPv6 SSRF blocklist ✅ Covers mapped, compat, multicast, 6to4, Teredo, ORCHID
Structured error taxonomy ✅ Closed _meta.code so clients can switch on failure modes
Reasoning-model awareness ✅ Detects max_tokens cutoff during CoT, guides the caller
MCP 2025 tool annotations readOnlyHint / destructiveHint / idempotentHint on every tool

Tools

Tool Description
chat_completion Send messages to any OpenRouter model. Detects reasoning-model cutoffs.
analyze_image Analyze images from local files, URLs, or data URIs. Auto-optimized with sharp.
analyze_audio Analyze/transcribe audio (WAV, MP3, FLAC, OGG, etc.) from files, URLs, or data URIs.
analyze_video Analyze/transcribe video (mp4, mpeg, mov, webm) from files, URLs, or data URIs.
generate_image Generate images from text prompts. Optional path-sandboxed disk save.
generate_audio Generate audio from text. Auto-detects format, wraps raw PCM in WAV.
generate_video Generate video via OpenRouter's async API (Veo 3.1 / Sora 2 Pro / Seedance / Wan). Submits, polls, downloads, saves.
get_video_status Resume polling a generate_video job by id. Download + save when complete.
search_models Search/filter models by name, provider, or capabilities (vision / audio / video).
get_model_info Get pricing, context length, and capabilities for any model.
validate_model Check if a model ID exists on OpenRouter.

All error responses carry _meta.code from a closed taxonomy: INVALID_INPUT · UNSAFE_PATH · UPSTREAM_HTTP · UPSTREAM_TIMEOUT · UPSTREAM_REFUSED · UNSUPPORTED_FORMAT · RESOURCE_TOO_LARGE · ZDR_INCOMPATIBLE · MODEL_NOT_FOUND · JOB_FAILED · JOB_STILL_RUNNING · INTERNAL

Quick Start

Prerequisites

Get a free API key from openrouter.ai/keys.

Option 1: npx (no install)

{
  "mcpServers": {
    "openrouter": {
      "command": "npx",
      "args": ["-y", "@stabgan/openrouter-mcp-multimodal"],
      "env": {
        "OPENROUTER_API_KEY": "sk-or-v1-..."
      }
    }
  }
}

Option 2: Docker

{
  "mcpServers": {
    "openrouter": {
      "command": "docker",
      "args": [
        "run", "--rm", "-i",
        "-e", "OPENROUTER_API_KEY=sk-or-v1-...",
        "stabgan/openrouter-mcp-multimodal:latest"
      ]
    }
  }
}

Option 3: Global install

npm install -g @stabgan/openrouter-mcp-multimodal
{
  "mcpServers": {
    "openrouter": {
      "command": "openrouter-multimodal",
      "env": { "OPENROUTER_API_KEY": "sk-or-v1-..." }
    }
  }
}

Option 4: Smithery

npx -y @smithery/cli install @stabgan/openrouter-mcp-multimodal --client claude

Configuration

Environment variables (click to expand)
Variable Required Default Description
OPENROUTER_API_KEY Yes Your OpenRouter API key
OPENROUTER_DEFAULT_MODEL No nvidia/nemotron-nano-12b-v2-vl:free Default model for chat + analyze tools
DEFAULT_MODEL No Alias for above
OPENROUTER_MODEL_CACHE_TTL_MS No 3600000 Model cache TTL (ms)
OPENROUTER_IMAGE_MAX_DIMENSION No 800 Longest edge for resize (px)
OPENROUTER_IMAGE_JPEG_QUALITY No 80 JPEG quality (1–100)
OPENROUTER_IMAGE_FETCH_TIMEOUT_MS No 30000 Image URL timeout
OPENROUTER_IMAGE_MAX_DOWNLOAD_BYTES No 26214400 Image URL size cap (~25 MB)
OPENROUTER_IMAGE_MAX_REDIRECTS No 8 Image URL redirect cap
OPENROUTER_IMAGE_MAX_DATA_URL_BYTES No 20971520 Image data URL size cap (~20 MB)
OPENROUTER_AUDIO_FETCH_TIMEOUT_MS No 30000 Audio URL timeout
OPENROUTER_AUDIO_MAX_DOWNLOAD_BYTES No 26214400 Audio URL size cap (~25 MB)
OPENROUTER_AUDIO_MAX_REDIRECTS No 8 Audio URL redirect cap
OPENROUTER_AUDIO_MAX_DATA_URL_BYTES No 20971520 Audio data URL size cap
OPENROUTER_DEFAULT_VIDEO_MODEL No google/gemini-2.5-flash Default for analyze_video
OPENROUTER_DEFAULT_VIDEO_GEN_MODEL No google/veo-3.1 Default for generate_video
OPENROUTER_VIDEO_FETCH_TIMEOUT_MS No 60000 Video URL timeout
OPENROUTER_VIDEO_MAX_DOWNLOAD_BYTES No 104857600 Video URL size cap (~100 MB)
OPENROUTER_VIDEO_MAX_REDIRECTS No 8 Video URL redirect cap
OPENROUTER_VIDEO_MAX_DATA_URL_BYTES No 104857600 Video data URL size cap
OPENROUTER_VIDEO_POLL_INTERVAL_MS No 15000 Async video poll cadence
OPENROUTER_VIDEO_MAX_WAIT_MS No 600000 Max wait before returning a resumable handle
OPENROUTER_VIDEO_GEN_MAX_BYTES No 268435456 Generated video download cap (~256 MB)
OPENROUTER_VIDEO_INLINE_MAX_BYTES No 10485760 Inline video ceiling (~10 MB)
OPENROUTER_OUTPUT_DIR No process.cwd() Sandbox root for save_path
OPENROUTER_ALLOW_UNSAFE_PATHS No 1 disables the sandbox
OPENROUTER_LOG_LEVEL No info error / warn / info / debug

Security notes

  • Analyze tools can read local files and fetch HTTP(S) URLs. URL fetches block private/link-local/reserved IPv4 and IPv6 targets (SSRF mitigation) and cap response size.
  • Generate tools write to disk through a path sandbox: save_path is resolved against OPENROUTER_OUTPUT_DIR and any traversal attempt is rejected. Override with OPENROUTER_ALLOW_UNSAFE_PATHS=1.
  • IPv6 SSRF blocklist covers loopback, unspecified, IPv4-mapped, IPv4-compatible, link-local, site-local, ULA, multicast, documentation, Teredo, ORCHID, and 6to4 of private IPv4.

Usage Examples

# Chat
Use chat_completion to explain quantum computing in simple terms.

# Vision
Use analyze_image on /path/to/photo.jpg and tell me what you see.

# Audio transcription
Use analyze_audio on /path/to/recording.mp3 to transcribe it.

# Video understanding
Use analyze_video on /path/to/clip.mp4 — what happens at 00:15?

# Generate audio
Use generate_audio with prompt "Explain neural networks" and voice "alloy", save to ./response.wav

# Generate music
Use generate_audio with model "google/lyria-3-clip-preview" and prompt "upbeat jazz piano trio"

# Generate image
Use generate_image with prompt "a cat astronaut on mars" and save to ./cat.png

# Generate video
Use generate_video with model "google/veo-3.1", prompt "a calm river at sunrise",
resolution 720p, duration 4, save to ./river.mp4

# Resume a video job
Use get_video_status with video_id "vid_abc123" and save_path "./river.mp4"

Architecture

src/
├── index.ts                    # Entry, env validation, graceful shutdown
├── tool-handlers.ts            # 11 tools (annotated) + dispatch
├── model-cache.ts              # TTL + in-flight coalescing
├── openrouter-api.ts           # REST client (chat + /videos)
├── errors.ts                   # Closed ErrorCode enum
├── logger.ts                   # JSON-line structured logger
└── tool-handlers/
    ├── fetch-utils.ts          # SSRF, bounded fetch, data-URL parser
    ├── openrouter-errors.ts    # SDK/HTTP → ErrorCode classifier
    ├── completion-utils.ts     # Reasoning-model cutoff detection
    ├── path-safety.ts          # save_path sandbox
    ├── chat-completion.ts      # Text + multimodal chat
    ├── analyze-image.ts        # Vision analysis
    ├── analyze-audio.ts        # Audio transcription
    ├── analyze-video.ts        # Video understanding
    ├── generate-image.ts       # Image generation
    ├── generate-audio.ts       # Audio generation + streaming
    ├── generate-video.ts       # Video generation (async)
    ├── image-utils.ts          # Sharp optimization, MIME sniffing
    ├── audio-utils.ts          # Audio format detection
    ├── video-utils.ts          # Video format detection
    ├── search-models.ts        # Model search
    ├── get-model-info.ts       # Model detail lookup
    └── validate-model.ts       # Model existence check

Development

git clone https://github.com/stabgan/openrouter-mcp-multimodal.git
cd openrouter-mcp-multimodal
npm install
cp .env.example .env  # Add your API key
npm run build
npm start
npm test                    # 163 unit tests, <1s
npm run test:integration    # Live API tests
npm run lint
node scripts/live-e2e.mjs  # 16 live E2E scenarios

Upgrading from v2

v3 is additive — no tool schemas or env vars were removed.

  • Three new tools: analyze_video, generate_video, get_video_status
  • Structured _meta.code on every error response (text messages preserved)
  • save_path sandboxed by default — set OPENROUTER_OUTPUT_DIR or OPENROUTER_ALLOW_UNSAFE_PATHS=1
  • Reasoning-model awareness: content: null + finish_reason: length now returns INVALID_INPUT with a preview instead of empty string
  • IPv6 SSRF coverage extended to mapped, compat, multicast, 6to4, Teredo, ORCHID

Compatibility

Works with any MCP client: Kiro · Claude Desktop · Cursor · Windsurf · Cline · any MCP-compatible client.

License

MIT

Contributing

Issues and PRs welcome. Please open an issue first for major changes.