🏎️ bench-my-llm

New here? Start with the Getting Started Guide.

Stop guessing which model is faster. Measure it.

Point bench-my-llm at any OpenAI-compatible API and get latency, throughput, cost, and quality metrics in seconds. Compare models side by side. Get a beautiful terminal report. Ship with confidence.

✨ Features

🔥 TTFT Measurement - Time to first token via streaming
⚡ Tokens per Second - Real throughput numbers
📊 p50 / p95 / p99 Latencies - Production-grade percentiles
💰 Cost Estimation - Know what you're spending
🎯 Quality Scoring - Compare responses against reference answers
🏁 Model Comparison - Side-by-side with winner highlights
📦 Built-in Prompt Suites - Reasoning, coding, creative, factual
🔌 Any OpenAI-compatible API - OpenAI, Anthropic, Ollama, vLLM, Together, and more
💾 Export to JSON - Pipe into CI, dashboards, or your own tools

🚀 Quick Start

pip install bench-my-llm

Single Model Benchmark

bench-my-llm run --model gpt-4o --suite reasoning

┌──────────────────────────────────────────────────────────┐
│  🏎️  Benchmark Report                                    │
│  bench-my-llm results for gpt-4o                         │
│  Suite: reasoning | Prompts: 5 | Cost: $0.0043           │
└──────────────────────────────────────────────────────────┘

          Latency Summary
┌────────┬────────────┬────────────────────┐
│ Metric │ TTFT (ms)  │ Total Latency (ms) │
├────────┼────────────┼────────────────────┤
│ p50    │ 234.1      │ 1,523.4            │
│ p95    │ 312.7      │ 2,187.9            │
│ p99    │ 348.2      │ 2,401.3            │
│ Mean   │ 251.3      │ 1,687.2            │
└────────┴────────────┴────────────────────┘

       Throughput & Quality
┌───────────────────┬─────────────┐
│ Metric            │ Value       │
├───────────────────┼─────────────┤
│ Mean TPS          │ 67.3 tok/s  │
│ Median TPS        │ 64.8 tok/s  │
│ Quality Score     │ 82%         │
│ Estimated Cost    │ $0.0043     │
└───────────────────┴─────────────┘

Model Comparison

bench-my-llm compare gpt-4o gpt-4o-mini --suite reasoning

┌──────────────────────────────────────────────────────────┐
│  🏁 Model Comparison                                     │
│  gpt-4o vs gpt-4o-mini                                   │
└──────────────────────────────────────────────────────────┘

              Head-to-Head
┌────────────────────────┬─────────┬─────────────┐
│ Metric                 │ gpt-4o  │ gpt-4o-mini │
├────────────────────────┼─────────┼─────────────┤
│ TTFT p50 (ms)          │ 234.1   │ 142.3  🏆   │
│ TTFT p95 (ms)          │ 312.7   │ 198.4  🏆   │
│ Total Latency p50 (ms) │ 1523.4  │ 876.2  🏆   │
│ Mean TPS               │ 67.3 🏆 │ 54.1        │
│ Cost (USD)             │ $0.0043 │ $0.0008 🏆  │
│ Quality Score          │ 0.82 🏆 │ 0.71        │
└────────────────────────┴─────────┴─────────────┘

🏆 Winner: gpt-4o-mini (4/6 metrics)

📖 Usage

Custom Prompts

Pass your own prompts file (JSON array):

[
  {"text": "Explain quantum computing", "category": "factual", "reference": "...", "max_tokens": 256}
]

Prompt Suites

Suite	Description	Prompts
`reasoning`	Logic, math, step-by-step	5
`coding`	Code generation and explanation	5
`creative`	Writing, storytelling, metaphors	5
`factual`	Knowledge recall, definitions	5
`all`	Everything combined	20

Export Results

bench-my-llm run --model gpt-4o --suite all --output results.json
bench-my-llm report results.json

Local Models (Ollama)

bench-my-llm run --model llama3 --base-url http://localhost:11434/v1 --api-key ollama

CI Integration

Add to your GitHub Actions workflow:

- name: Benchmark LLM
  run: |
    pip install bench-my-llm
    bench-my-llm run --model gpt-4o-mini --suite reasoning --output benchmark.json
  env:
    OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}

- name: Upload results
  uses: actions/upload-artifact@v4
  with:
    name: benchmark-results
    path: benchmark.json

🛠️ Development

git clone https://github.com/manasvardhan/bench-my-llm.git
cd bench-my-llm
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
pytest

📄 License

MIT. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github/workflows		.github/workflows
examples		examples
src/bench_my_llm		src/bench_my_llm
tests		tests
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
GETTING_STARTED.md		GETTING_STARTED.md
LICENSE		LICENSE
README.md		README.md
ROADMAP.md		ROADMAP.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🏎️ bench-my-llm

✨ Features

🚀 Quick Start

Single Model Benchmark

Model Comparison

📖 Usage

Custom Prompts

Prompt Suites

Export Results

Local Models (Ollama)

CI Integration

🛠️ Development

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

License

ManasVardhan/bench-my-llm

Folders and files

Latest commit

History

Repository files navigation

🏎️ bench-my-llm

✨ Features

🚀 Quick Start

Single Model Benchmark

Model Comparison

📖 Usage

Custom Prompts

Prompt Suites

Export Results

Local Models (Ollama)

CI Integration

🛠️ Development

📄 License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages