Stop guessing which LLM to use. Get data-driven model recommendations based on your task requirements, budget, and performance needs.
With 100+ LLMs available—each with different strengths, pricing, and capabilities—choosing the right one is overwhelming. which-llm queries real benchmark data and gives you actionable recommendations.
Note: This tool provides best-effort suggestions based on benchmark scores and capability metadata. It does not substitute proper evaluation on your specific use case. Benchmarks have known limitations and may not reflect real-world performance for your domain.
The easiest way to use which-llm is through the agent skill—your AI coding assistant (Cursor, Claude Code, Copilot, etc.) learns how to recommend models for you automatically.
# macOS / Linux
brew tap richard-gyiko/tap
brew install which-llm
# Windows
scoop bucket add richard-gyiko https://github.com/richard-gyiko/scoop-bucket
scoop install which-llmOther installation methods
Manual download from GitHub Releases:
# macOS (Apple Silicon)
curl -LO https://github.com/richard-gyiko/which-llm/releases/latest/download/which-llm-aarch64-apple-darwin.tar.gz
tar -xzf which-llm-aarch64-apple-darwin.tar.gz
sudo mv which-llm /usr/local/bin/
# macOS (Intel)
curl -LO https://github.com/richard-gyiko/which-llm/releases/latest/download/which-llm-x86_64-apple-darwin.tar.gz
tar -xzf which-llm-x86_64-apple-darwin.tar.gz
sudo mv which-llm /usr/local/bin/
# Linux
curl -LO https://github.com/richard-gyiko/which-llm/releases/latest/download/which-llm-x86_64-unknown-linux-gnu.tar.gz
tar -xzf which-llm-x86_64-unknown-linux-gnu.tar.gz
sudo mv which-llm /usr/local/bin/From source (requires Rust):
cargo install --path .No API key required! The CLI fetches pre-built benchmark data from GitHub Releases, updated daily.
# Refresh data (run once to populate cache)
which-llm refresh
# Query models using SQL
which-llm query "SELECT name, intelligence, coding, price FROM benchmarks LIMIT 10"
# List available tables
which-llm tables
# Check data source info
which-llm infoOptional: Configure API access for real-time data
For the freshest data (instead of daily snapshots), you can configure direct API access to Artificial Analysis:
- Create an account at artificialanalysis.ai/login
- Generate an API key
- Configure the CLI:
which-llm profile create default --api-key YOUR_API_KEYOr set the ARTIFICIAL_ANALYSIS_API_KEY environment variable.
Then use the --use-api flag to fetch directly from the API:
which-llm refresh --use-api# Pick your AI coding tool
which-llm skill install cursor # Cursor
which-llm skill install claude # Claude Code
which-llm skill install opencode # OpenCode
which-llm skill install codex # Codex CLI
which-llm skill install windsurf # Windsurf
which-llm skill install copilot # GitHub Copilot
which-llm skill install antigravity # Antigravity
# Or install globally (available in all projects)
which-llm skill install cursor --globalNow just ask your AI assistant: "Which LLM should I use for [your task]?"
Tip: Most AI assistants will automatically load the skill when you mention "which llm" in your question—no need to explicitly tell it to load the skill.
Experimental: The skill is under active development. Recommendations and output format may change as we refine the task classification and benchmark interpretation.
The skill follows the Agent Skills open standard. See the full skill documentation for details on how it classifies tasks and selects models.
The skill teaches your AI assistant to:
- Classify your task — Is it transformational (summarize, extract), analytical (compare, justify), tool-using (API calls), or agentic (planning, orchestration)?
- Derive requirements — Map task complexity to minimum benchmark thresholds
- Query real data — Filter models by capability (tool calling, context window, structured output)
- Recommend Primary + Fallback — Suggest a cost-optimized cascade (cheap model first, escalate when needed)
Data sources:
- Artificial Analysis — Benchmark scores (intelligence, coding, math) and performance metrics (price, latency, throughput)
- models.dev — Capability metadata (context window, tool calling, structured output, reasoning)
You: "I'm building a feature that summarizes long Slack threads so users can catch up quickly. Threads can be pretty long, like 200+ messages. Which LLM should I use?"
Assistant (with skill loaded):
- Skill Type: Transformational (summarization)
- Key Constraints: Large context window (200+ messages)
- Priority: Balanced (quality + cost + speed matter equally)
- Intelligence: 24.7 | Context: 131k tokens | Throughput: 305 tps
- Why: Excellent cost-capability ratio. 131k context easily handles 200+ messages.
- Intelligence: 30-39 | Context: 256k tokens
- Use if: Primary struggles with nuanced summaries or threads exceed 300+ messages
- Cascade (70/30 split): $0.115/M tokens
- Savings vs always using fallback: 23%
Validation step: Before deploying, test both models on 5-10 representative Slack threads from your workspace.
View full transcript — shows the complete flow including CLI queries and scoring.
For power users, scripting, or debugging, you can query the data directly.
Use full SQL expressiveness on the cached benchmark data:
# Best coding models under $5/M (benchmarks table)
which-llm query "SELECT name, creator, coding, output_price FROM benchmarks WHERE coding > 40 AND output_price < 5 ORDER BY coding DESC"
# Models with tool calling and large context (models table)
which-llm query "SELECT model_name, provider_name, context_window, tool_call FROM models WHERE tool_call = true AND context_window > 100000"
# List available tables
which-llm tables
# Show schema for a specific table
which-llm tables benchmarksAvailable tables and columns
| Table | Description | Source |
|---|---|---|
benchmarks |
LLM benchmark scores and pricing | Artificial Analysis |
models |
Capability metadata and provider info | models.dev |
text_to_image |
Text-to-image models | Artificial Analysis |
image_editing |
Image editing models | Artificial Analysis |
text_to_speech |
Text-to-speech models | Artificial Analysis |
text_to_video |
Text-to-video models | Artificial Analysis |
image_to_video |
Image-to-video models | Artificial Analysis |
| Column | Type | Description |
|---|---|---|
name |
VARCHAR | Model name |
creator |
VARCHAR | Creator (OpenAI, Anthropic, etc.) |
intelligence |
DOUBLE | Intelligence index |
coding |
DOUBLE | Coding index |
math |
DOUBLE | Math index |
input_price |
DOUBLE | Price per 1M input tokens |
output_price |
DOUBLE | Price per 1M output tokens |
tps |
DOUBLE | Tokens per second |
latency |
DOUBLE | Time to first token (seconds) |
| Column | Type | Description |
|---|---|---|
model_name |
VARCHAR | Model name |
provider_name |
VARCHAR | Provider (OpenAI, Anthropic, etc.) |
context_window |
BIGINT | Maximum context window |
tool_call |
BOOLEAN | Supports function calling |
structured_output |
BOOLEAN | Supports JSON mode |
reasoning |
BOOLEAN | Chain-of-thought model |
open_weights |
BOOLEAN | Weights publicly available |
Note: The
benchmarksandmodelstables are independent. Use SQL to join or correlate data between them based on model/provider names.
Compare models side-by-side with highlighted winners:
# Compare two or more models
which-llm compare "gpt-5 (high)" "claude 4.5 sonnet" "gemini 2.5 pro"
# Show additional fields
which-llm compare "gpt-5" "claude-4.5" --verbose
# Output formats: --json, --csv, --table, --plain
which-llm compare "gpt-5" "claude-4.5" --jsonThe compare command uses fuzzy matching on model names and displays a transposed table with models as columns and metrics as rows. Winners for each metric are marked with *.
Estimate token costs with projections:
# Single model cost calculation
which-llm cost "gpt-5 (high)" --input 10k --output 5k
# Compare costs across models
which-llm cost "gpt-5" "claude 4.5" --input 1M --output 500k
# Daily/monthly projections with request volume
which-llm cost "gpt-5 (high)" --input 2k --output 1k --requests 1000 --period daily
# Supports token units: k (thousands), M (millions), B (billions)
which-llm cost "claude-4.5" --input 1.5M --output 750k# Refresh data from sources
which-llm refresh
# View data source and attribution info
which-llm info
# Manage cache
which-llm cache status
which-llm cache clear
# Manage profiles (for API access)
which-llm profile list
which-llm profile create work --api-key KEY
which-llm profile default work
# Skill management
which-llm skill list
which-llm skill uninstall cursor- Benchmark data provided by Artificial Analysis
- Capability metadata provided by models.dev
This tool uses data from the Artificial Analysis API. Per the API terms, attribution is required for all use of the data.
The CLI uses pre-built benchmark data hosted on GitHub Releases, updated daily via automated workflows. This means:
- No API key required for basic usage
- Data is typically less than 24 hours old
- Use
which-llm infoto see when data was last updated - Use
which-llm refreshto fetch fresh data from sources - Use
which-llm refresh --use-apiwith an API key for real-time data
MIT