Achieve peak inference performance—directly in your editor.
PeakInfer analyzes every LLM inference point in your code to find what's holding back your latency, throughput, and reliability.
Your code says streaming: true. Runtime shows 0% actual streams. That's drift—and you can't see it until production.
Peak Inference Performance means: Improving latency, throughput, reliability, and cost without changing evaluated behavior.
- Inline Diagnostics: See performance issues highlighted in your code
- Drift Detection: Find mismatches between code declarations and runtime behavior
- Results Panel: Comprehensive analysis view with actionable recommendations
- Benchmark Comparison: Compare to InferenceMAX benchmarks (15+ models)
- Multiple Languages: TypeScript, JavaScript, Python, Go, Rust
- Install from VS Code Marketplace: Search "PeakInfer" or install directly
- Or via command line:
code --install-extension kalmantic.peakinfer
Get your PeakInfer token at peakinfer.com/dashboard — sign in with GitHub, generate token.
50 free credits included. No credit card.
Option 1: VS Code Settings (Recommended)
- Open Settings (Cmd+, / Ctrl+,)
- Search "PeakInfer"
- Enter your token in "PeakInfer Token"
Option 2: Environment Variable
Set PEAKINFER_TOKEN in your shell:
export PEAKINFER_TOKEN=pk_your-token-here- Command Palette:
PeakInfer: Analyze Current File - Keyboard: Cmd+Shift+P (Mac) / Ctrl+Shift+P (Windows/Linux)
- Right-click in editor: "PeakInfer: Analyze Current File"
- Command Palette:
PeakInfer: Analyze Workspace - Right-click folder in Explorer: "PeakInfer: Analyze Workspace"
- Command Palette:
PeakInfer: Show Results Panel
PeakInfer analyzes every inference point across 4 dimensions:
| Dimension | What We Find |
|---|---|
| Latency | Missing streaming, blocking calls, p95 vs benchmark gaps |
| Throughput | Sequential bottlenecks, batch opportunities |
| Reliability | Missing retries, timeouts, fallbacks |
| Cost | Right-sized model selection, token optimization |
| Setting | Default | Description |
|---|---|---|
peakinfer.token |
"" |
PeakInfer token (or use env var) |
peakinfer.analyzeOnSave |
false |
Auto-analyze on file save |
peakinfer.showInlineHints |
true |
Show inline hints for issues |
peakinfer.severityThreshold |
warning |
Minimum severity to show |
peakinfer.includeBenchmarks |
true |
Include benchmark comparisons |
peakinfer.excludePatterns |
["**/node_modules/**", ...] |
Patterns to exclude |
| Command | Description |
|---|---|
PeakInfer: Analyze Current File |
Analyze the active file |
PeakInfer: Analyze Workspace |
Analyze entire workspace |
PeakInfer: Show Results Panel |
Open results panel |
PeakInfer: Clear Diagnostics |
Clear all diagnostics |
PeakInfer: Set Token |
Configure PeakInfer token |
- OpenAI (GPT-4o, GPT-4, GPT-3.5, etc.)
- Anthropic (Claude)
- Azure OpenAI
- AWS Bedrock
- Google Vertex AI
- vLLM, TensorRT-LLM (HTTP detection)
- LangChain, LlamaIndex (framework detection)
Same as GitHub Action:
- Free: 50 credits one-time (6-month expiry)
- Starter: $19 for 200 credits
- Growth: $49 for 600 credits
- Scale: $149 for 2,000 credits
Credits are shared across VS Code and GitHub Action.
- Check token is configured (Settings or env var)
- Ensure file contains LLM API calls
- Check Output panel for errors (View > Output > PeakInfer)
Large files may take 10-30 seconds. Check Output panel for progress.
Check balance at peakinfer.com/dashboard
Apache-2.0