which-llm

Stop guessing which LLM to use. Get data-driven model recommendations based on your task requirements, budget, and performance needs.

With 100+ LLMs available—each with different strengths, pricing, and capabilities—choosing the right one is overwhelming. which-llm queries real benchmark data and gives you actionable recommendations.

Note: This tool provides best-effort suggestions based on benchmark scores and capability metadata. It does not substitute proper evaluation on your specific use case. Benchmarks have known limitations and may not reflect real-world performance for your domain.

Quick Start

The easiest way to use which-llm is through the agent skill—your AI coding assistant (Cursor, Claude Code, Copilot, etc.) learns how to recommend models for you automatically.

1. Install the CLI

# macOS / Linux
brew tap richard-gyiko/tap
brew install which-llm

# Windows
scoop bucket add richard-gyiko https://github.com/richard-gyiko/scoop-bucket
scoop install which-llm

Other installation methods

Manual download from GitHub Releases:

# macOS (Apple Silicon)
curl -LO https://github.com/richard-gyiko/which-llm/releases/latest/download/which-llm-aarch64-apple-darwin.tar.gz
tar -xzf which-llm-aarch64-apple-darwin.tar.gz
sudo mv which-llm /usr/local/bin/

# macOS (Intel)
curl -LO https://github.com/richard-gyiko/which-llm/releases/latest/download/which-llm-x86_64-apple-darwin.tar.gz
tar -xzf which-llm-x86_64-apple-darwin.tar.gz
sudo mv which-llm /usr/local/bin/

# Linux
curl -LO https://github.com/richard-gyiko/which-llm/releases/latest/download/which-llm-x86_64-unknown-linux-gnu.tar.gz
tar -xzf which-llm-x86_64-unknown-linux-gnu.tar.gz
sudo mv which-llm /usr/local/bin/

From source (requires Rust):

cargo install --path .

2. Start Using It

No API key required! The CLI fetches pre-built benchmark data from GitHub Releases, updated daily.

# Refresh data (run once to populate cache)
which-llm refresh

# Query models using SQL
which-llm query "SELECT name, intelligence, coding, price FROM benchmarks LIMIT 10"

# List available tables
which-llm tables

# Check data source info
which-llm info

Optional: Configure API access for real-time data

For the freshest data (instead of daily snapshots), you can configure direct API access to Artificial Analysis:

Create an account at artificialanalysis.ai/login
Generate an API key
Configure the CLI:

which-llm profile create default --api-key YOUR_API_KEY

Or set the ARTIFICIAL_ANALYSIS_API_KEY environment variable.

Then use the --use-api flag to fetch directly from the API:

which-llm refresh --use-api

3. Install the Skill

# Pick your AI coding tool
which-llm skill install cursor      # Cursor
which-llm skill install claude      # Claude Code
which-llm skill install opencode    # OpenCode
which-llm skill install codex       # Codex CLI
which-llm skill install windsurf    # Windsurf
which-llm skill install copilot     # GitHub Copilot
which-llm skill install antigravity # Antigravity

# Or install globally (available in all projects)
which-llm skill install cursor --global

Now just ask your AI assistant: "Which LLM should I use for [your task]?"

Tip: Most AI assistants will automatically load the skill when you mention "which llm" in your question—no need to explicitly tell it to load the skill.

Experimental: The skill is under active development. Recommendations and output format may change as we refine the task classification and benchmark interpretation.

The skill follows the Agent Skills open standard. See the full skill documentation for details on how it classifies tasks and selects models.

How It Works

The skill teaches your AI assistant to:

Classify your task — Is it transformational (summarize, extract), analytical (compare, justify), tool-using (API calls), or agentic (planning, orchestration)?
Derive requirements — Map task complexity to minimum benchmark thresholds
Query real data — Filter models by capability (tool calling, context window, structured output)
Recommend Primary + Fallback — Suggest a cost-optimized cascade (cheap model first, escalate when needed)

Data sources:

Artificial Analysis — Benchmark scores (intelligence, coding, math) and performance metrics (price, latency, throughput)
models.dev — Capability metadata (context window, tool calling, structured output, reasoning)

Example

You: "I'm building a feature that summarizes long Slack threads so users can catch up quickly. Threads can be pretty long, like 200+ messages. Which LLM should I use?"

Assistant (with skill loaded):

Task Classification

Skill Type: Transformational (summarization)

Key Constraints: Large context window (200+ messages)

Priority: Balanced (quality + cost + speed matter equally)

Recommendations

Primary: gpt-oss-20B ($0.10/M)

Intelligence: 24.7 | Context: 131k tokens | Throughput: 305 tps

Why: Excellent cost-capability ratio. 131k context easily handles 200+ messages.

Fallback: MiMo-V2-Flash ($0.15/M)

Intelligence: 30-39 | Context: 256k tokens

Use if: Primary struggles with nuanced summaries or threads exceed 300+ messages

Cost Estimate

Cascade (70/30 split): $0.115/M tokens

Savings vs always using fallback: 23%

Validation step: Before deploying, test both models on 5-10 representative Slack threads from your workspace.

View full transcript — shows the complete flow including CLI queries and scoring.

CLI Reference

For power users, scripting, or debugging, you can query the data directly.

SQL Queries (Primary Interface)

Use full SQL expressiveness on the cached benchmark data:

# Best coding models under $5/M (benchmarks table)
which-llm query "SELECT name, creator, coding, output_price FROM benchmarks WHERE coding > 40 AND output_price < 5 ORDER BY coding DESC"

# Models with tool calling and large context (models table)
which-llm query "SELECT model_name, provider_name, context_window, tool_call FROM models WHERE tool_call = true AND context_window > 100000"

# List available tables
which-llm tables

# Show schema for a specific table
which-llm tables benchmarks

Available tables and columns

Tables

Table	Description	Source
`benchmarks`	LLM benchmark scores and pricing	Artificial Analysis
`models`	Capability metadata and provider info	models.dev
`text_to_image`	Text-to-image models	Artificial Analysis
`image_editing`	Image editing models	Artificial Analysis
`text_to_speech`	Text-to-speech models	Artificial Analysis
`text_to_video`	Text-to-video models	Artificial Analysis
`image_to_video`	Image-to-video models	Artificial Analysis

Benchmarks Table (Artificial Analysis)

Column	Type	Description
`name`	VARCHAR	Model name
`creator`	VARCHAR	Creator (OpenAI, Anthropic, etc.)
`intelligence`	DOUBLE	Intelligence index
`coding`	DOUBLE	Coding index
`math`	DOUBLE	Math index
`input_price`	DOUBLE	Price per 1M input tokens
`output_price`	DOUBLE	Price per 1M output tokens
`tps`	DOUBLE	Tokens per second
`latency`	DOUBLE	Time to first token (seconds)

Models Table (models.dev)

Column	Type	Description
`model_name`	VARCHAR	Model name
`provider_name`	VARCHAR	Provider (OpenAI, Anthropic, etc.)
`context_window`	BIGINT	Maximum context window
`tool_call`	BOOLEAN	Supports function calling
`structured_output`	BOOLEAN	Supports JSON mode
`reasoning`	BOOLEAN	Chain-of-thought model
`open_weights`	BOOLEAN	Weights publicly available

Note: The benchmarks and models tables are independent. Use SQL to join or correlate data between them based on model/provider names.

Compare Models

Compare models side-by-side with highlighted winners:

# Compare two or more models
which-llm compare "gpt-5 (high)" "claude 4.5 sonnet" "gemini 2.5 pro"

# Show additional fields
which-llm compare "gpt-5" "claude-4.5" --verbose

# Output formats: --json, --csv, --table, --plain
which-llm compare "gpt-5" "claude-4.5" --json

The compare command uses fuzzy matching on model names and displays a transposed table with models as columns and metrics as rows. Winners for each metric are marked with *.

Calculate Token Costs

Estimate token costs with projections:

# Single model cost calculation
which-llm cost "gpt-5 (high)" --input 10k --output 5k

# Compare costs across models
which-llm cost "gpt-5" "claude 4.5" --input 1M --output 500k

# Daily/monthly projections with request volume
which-llm cost "gpt-5 (high)" --input 2k --output 1k --requests 1000 --period daily

# Supports token units: k (thousands), M (millions), B (billions)
which-llm cost "claude-4.5" --input 1.5M --output 750k

Other Commands

# Refresh data from sources
which-llm refresh

# View data source and attribution info
which-llm info

# Manage cache
which-llm cache status
which-llm cache clear

# Manage profiles (for API access)
which-llm profile list
which-llm profile create work --api-key KEY
which-llm profile default work

# Skill management
which-llm skill list
which-llm skill uninstall cursor

Attribution

Benchmark data provided by Artificial Analysis
Capability metadata provided by models.dev

This tool uses data from the Artificial Analysis API. Per the API terms, attribution is required for all use of the data.

Data Freshness

The CLI uses pre-built benchmark data hosted on GitHub Releases, updated daily via automated workflows. This means:

No API key required for basic usage
Data is typically less than 24 hours old
Use which-llm info to see when data was last updated
Use which-llm refresh to fetch fresh data from sources
Use which-llm refresh --use-api with an API key for real-time data

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 89 Commits
.cargo		.cargo
.github/workflows		.github/workflows
.opencode/command		.opencode/command
examples		examples
openspec		openspec
skills/which-llm		skills/which-llm
src		src
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
clippy.toml		clippy.toml
rustfmt.toml		rustfmt.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

which-llm

Quick Start

1. Install the CLI

2. Start Using It

3. Install the Skill

How It Works

Example

Task Classification

Recommendations

Primary: gpt-oss-20B ($0.10/M)

Fallback: MiMo-V2-Flash ($0.15/M)

Cost Estimate

CLI Reference

SQL Queries (Primary Interface)

Tables

Benchmarks Table (Artificial Analysis)

Models Table (models.dev)

Compare Models

Calculate Token Costs

Other Commands

Attribution

Data Freshness

License

About

Uh oh!

Releases 43

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

which-llm

Quick Start

1. Install the CLI

2. Start Using It

3. Install the Skill

How It Works

Example

Task Classification

Recommendations

Primary: gpt-oss-20B ($0.10/M)

Fallback: MiMo-V2-Flash ($0.15/M)

Cost Estimate

CLI Reference

SQL Queries (Primary Interface)

Tables

Benchmarks Table (Artificial Analysis)

Models Table (models.dev)

Compare Models

Calculate Token Costs

Other Commands

Attribution

Data Freshness

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 43

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages