Second opinions from multiple LLMs—right inside Claude Code
Need a second opinion on your code? Want validation before merging? Looking for domain expertise your model doesn't have? Stuck in a loop and need fresh eyes to break out?
One model's confidence isn't proof. K-LEAN brings in OpenAI, Gemini, DeepSeek, Moonshot, Minimax, and more—when multiple models agree, you ship with confidence.
- 9 slash commands —
/kln:quick,/kln:multi,/kln:agent,/kln:rethink... - 8 specialist agents — Security, Rust, embedded C, ARM Cortex, performance
- 5 smart hooks — Service auto-start, keyword handling, git tracking, web capture, session log
- Persistent knowledge — Insights that survive across sessions
Access any model via NanoGPT or OpenRouter, directly from Claude Code.
Works on Windows, Linux, and macOS — native cross-platform support, no shell scripts required.
Choose one provider and get your API key:
- NanoGPT — Subscription access to DeepSeek, Qwen, GLM, Kimi
- OpenRouter — Unified access to GPT, Gemini, Claude
Linux / macOS:
# Install pipx if you don't have it
python3 -m pip install --user pipx
python3 -m pipx ensurepath
# Install K-LEAN
pipx install kln-aiWindows (PowerShell):
# Install pipx if you don't have it
python -m pip install --user pipx
python -m pipx ensurepath
# Restart PowerShell, then install K-LEAN
pipx install kln-aikln init # Select provider, enter API key
kln start # Start LiteLLM proxy
kln status # Verify everything worksOr non-interactive:
kln init --provider nanogpt --api-key $NANOGPT_API_KEY
kln startIf you only want the persistent knowledge system without multi-model reviews:
pipx install kln-ai
kln init --provider skipThis installs the knowledge database, session hooks, and slash commands (/kln:learn, /kln:remember, /kln:find). No LiteLLM proxy or API keys required. Add a provider later with kln init --provider nanogpt --api-key $KEY.
/kln:quick "security" # Fast review (~30s)
/kln:multi "error handling" # 3-5 model consensus (~60s)
/kln:agent security-auditor # Specialist agent (~2min)kln model add --provider openrouter "anthropic/claude-3.5-sonnet"
kln model remove "claude-3-sonnet"
kln start # Restart to apply changes$ /kln:multi "review authentication flow"
GRADE: B+ | RISK: MEDIUM
HIGH CONFIDENCE (4/5 models agree):
- auth.py:42 - SQL injection risk in user query
- session.py:89 - Missing token expiration check
MEDIUM CONFIDENCE (2/5 models agree):
- login.py:15 - Consider rate limiting
Three ways to get external perspectives—pick based on speed vs depth:
| Command | What Happens | Time |
|---|---|---|
/kln:quick |
1 model reviews code you provide | ~30s |
/kln:multi |
3-5 models vote on same code | ~60s |
/kln:rethink |
Contrarian techniques when you're stuck | ~20s |
/kln:quick — You gather the code (git diff, file content), one model reviews it fast.
/kln:quick "security review"
# Grade: B+ | Risk: MEDIUM | 3 findings
/kln:multi — Same code goes to 5 models in parallel. When 4/5 agree, it's real.
/kln:multi "check error handling"
# 4/5 AGREE: Missing null check at line 42
/kln:rethink — Stuck debugging 10+ minutes? Get contrarian ideas: inversion, assumption challenge, domain shift.
/kln:rethink
# "What if the bug isn't in the parser—what if the input is already corrupt?"
How: LiteLLM proxy routes to multiple providers (NanoGPT, OpenRouter). Dynamic model discovery, parallel async execution, response aggregation with consensus scoring.
Your insights survive sessions. Capture mid-session or end-of-session:
/kln:learn — Extract learnings NOW, while context is fresh.
/kln:learn "JWT issue"
# Found 3 learnings → Saved to Knowledge DB
/kln:remember — End of session. Reviews git diff, extracts warnings/patterns/decisions, syncs to Serena MCP.
/kln:remember
# Saved 6 entries (2 warnings, 2 patterns, 1 solution, 1 decision)
# Synced to Serena kln-lessons-learned
/kln:find — Search anytime. Supports date, branch, and type filters.
/kln:find "JWT validation"
/kln:find auth since:2026-02-01
/kln:find type:decision since:2026-02-03
/kln:find auth branch:feature/auth
How: Per-project knowledge database with hybrid search—dense embeddings (BGE-small via fastembed) + sparse matching (BM42) + RRF fusion + cross-encoder reranking. Runs locally via ONNX, <100ms queries. V3.1 schema adds temporal filtering by date, branch, and entry type. Learnings are also auto-extracted on /compact via PreCompact hook.
No API key? Knowledge DB works fully offline. You can still use
/kln:learn,/kln:remember, and/kln:findwithout NanoGPT or OpenRouter—embeddings run locally on your machine.
Unlike models that review what you give them, agents read your codebase themselves.
8 specialists with tools: read_file, grep, search_files, knowledge_search, get_complexity.
| Agent | Expertise |
|---|---|
code-reviewer |
OWASP Top 10, SOLID, code quality |
security-auditor |
Vulnerabilities, auth, crypto |
debugger |
Root cause analysis |
performance-engineer |
Profiling, optimization |
rust-expert |
Ownership, lifetimes, unsafe |
c-pro |
C99/C11, POSIX, memory |
arm-cortex-expert |
Embedded ARM, real-time |
orchestrator |
Multi-agent coordination |
/kln:agent security-auditor "audit payment module"
# Agent greps for payment → reads 3 files → finds 2 vulnerabilities
/kln:agent rust-expert --model qwen3-coder "review unsafe blocks"
# Want a specific LLM? Use --model to pick your expert
--parallel — Need multiple perspectives? Run 3 specialists at once:
/kln:agent --parallel "review auth system"
# code-reviewer + security-auditor + performance-engineer → unified report
How: Built on smolagents with LiteLLM integration. Multi-step reasoning, tool use, and memory persistence.
5 hooks run automatically—you don't call them:
| Hook | Trigger | What It Does |
|---|---|---|
session-start |
Claude Code opens | Starts LiteLLM + Knowledge Server |
user-prompt |
You type InitKB |
Initialize project knowledge DB |
post-bash |
After bash commands | Git commits, test failures, build errors, package installs |
post-web |
After WebFetch | Doc URLs captured as discoveries |
pre-compact |
Context compaction | Session log via Haiku + auto-extract learnings to KB |
How: Claude Code hook system with pattern matching. Services auto-start on session begin. Git commits, test failures, build errors, and doc URLs auto-captured to KB with timestamp and branch metadata. On compaction, Haiku extracts 0-5 atomic learnings from the full conversation.
The knowledge system has three layers: capture, storage, and retrieval. Everything runs locally, per-project, with no external services.
Every session automatically captures knowledge without any commands:
| Source | What's Captured | Entry Type |
|---|---|---|
| Git commits | Commit message + SHA + branch | commit |
| Test failures | Failure output + file path | finding |
| Build errors | Error message + context | warning |
| Package installs | Package name + version | discovery |
| Doc URLs | URL content evaluated by LLM | discovery |
| Session compaction | Session changelog via Claude Haiku | session |
When context gets compacted, the system automatically generates a structured session changelog:
Transcript JSONL (thousands of lines)
|
v
[1] Delta extraction -----> Only lines since last compaction
| (uses compact_boundary markers)
v
[2] Noise filtering ------> Drop tool-only turns, filler text,
| system tags, slash command defs
v
[3] Clean dialogue -------> USER: messages + CLAUDE: text responses
| (~20% signal ratio from raw transcript)
v
[4] Enrich with context --> + git log (18h window, with commit bodies)
| + KB entries captured today
v
[5] Claude Haiku ---------> Structured markdown:
| Accomplished / Decisions / Discovered / Carry Forward
v
[6] Persist --------------> .serena/memories/kln-session-YYYY-MM-DD.md
+ searchable KB entry (type: session)
Multiple compactions per day append to the same log file (separated by ---). Each compaction only processes the conversation delta since the last one, so there's no overlap between entries.
Queries go through a 5-stage pipeline for high-quality results:
Query --> Dense embeddings (BGE-small) --> Sparse matching (BM42)
| |
v v
Dense scores Sparse scores
| |
+--> RRF Fusion <----------------+
|
v
Post-RRF filtering (date, branch, type)
|
v
Cross-encoder reranking (MiniLM)
|
v
Final ranked results
All models run locally via ONNX (fastembed). No API calls, no cloud. Queries return in <100ms via a TCP server that stays warm between searches.
At session start, the system injects context from previous sessions:
[SESSION] Last: Fix JWT race condition (abc1234) | Next: Integration tests
[!] WARNINGS (2): "SQL injection in login" | "Deprecated API usage"
[KB] PINNED: <high-priority entries>
[KB] RECENT: <latest findings, solutions, patterns>
This means every new session starts with awareness of what happened before -- carry-forward items, active warnings, and recent discoveries. The Knowledge DB acts as long-term memory that persists across sessions, compactions, and context limits.
Storage: Per-project .knowledge-db/ directory with entries.jsonl (append-only), dense/sparse index files, and a TCP server for fast queries. Schema V3.1 supports 9 entry types with date, branch, and type filtering.
[opus 4.5] │ claudeAgentic │ git:(main●) +27-23 │ llm:16 kb:42
Model. Project. Branch (● = dirty). Lines changed. Models ready. KB entry count.
How: Custom statusline polling LiteLLM and Knowledge DB via TCP on each prompt.
| Command | Description | Time |
|---|---|---|
/kln:quick <focus> |
Single model review | ~30s |
/kln:multi <focus> |
3-5 model consensus | ~60s |
/kln:agent <role> |
Specialist agent with tools | ~2min |
/kln:rethink |
Contrarian debugging | ~20s |
/kln:find <query> |
Search knowledge DB | ~5s |
/kln:learn |
Capture insights from context | ~10s |
/kln:remember |
End-of-session knowledge capture | ~20s |
/kln:doc <title> |
Generate session docs | ~30s |
/kln:status |
System health check | ~2s |
/kln:help |
Command reference | instant |
Flags: --async (background), --models N (count), --output json|text
# Setup (unified)
kln init # Initialize: install + configure provider (NanoGPT, OpenRouter, skip)
# Installation & Management
kln install # Install to ~/.claude/
kln uninstall # Remove components
kln status # Show component status
# Services
kln start # Start LiteLLM proxy
kln stop # Stop all services
# Diagnostics
kln doctor # Check configuration
kln doctor -f # Auto-fix issues
# Model Management (subgroup)
kln model list # List available models
kln model list --health # Check model health
kln model add # Add individual model
kln model remove # Remove model
kln model test # Test a specific model
# Provider Management (subgroup)
kln provider list # Show configured providers
kln provider add # Add provider with recommended models
kln provider set-key # Update API key
kln provider remove # Remove provider
# Review
kln multi # Run multi-agent orchestrated review| Requirement | Version | Notes |
|---|---|---|
| Python | 3.9+ | python3 --version |
| Claude Code | 2.0+ | claude --version |
| pipx | any | pipx --version |
| API Key | - | NanoGPT or OpenRouter (optional for knowledge-only) |
K-LEAN comes with curated model sets for each provider—no manual configuration needed.
NanoGPT — Subscription access to top-tier models.
10 models pre-configured:
| Model | Provider | Specialty |
|---|---|---|
deepseek-r1 |
DeepSeek | Reasoning, code review |
deepseek-v3.2 |
DeepSeek | Fast general purpose |
qwen3-coder |
Alibaba | Code-focused |
glm-4.7 |
Zhipu | Multilingual |
kimi-k2 |
Moonshot | Long context |
llama-4-maverick |
Meta | Creative |
llama-4-scout |
Meta | Analytical |
mimo-v2-flash |
Xiaomi | Fast inference |
gpt-oss-120b |
OpenAI-OSS | Large capacity |
devstral-2-123b |
Mistral | Code generation |
+4 thinking models (auto-configured): deepseek-v3.2-thinking, glm-4.7-thinking, kimi-k2-thinking, deepseek-r1-thinking
OpenRouter — Unified API for multiple providers.
6 models pre-configured:
| Model | Provider | Specialty |
|---|---|---|
gemini-3-flash |
Fast, multimodal | |
gemini-2.5-flash |
Balanced | |
gpt-5-mini |
OpenAI | Efficient |
gpt-5.1-codex-mini |
OpenAI | Code-focused |
qwen3-coder-plus |
Alibaba | Enhanced coding |
deepseek-v3.2 |
DeepSeek | Reasoning |
For a complete coding experience:
| Tool | Integration |
|---|---|
| SuperClaude | Use /sc:* and /kln:* together |
| Serena MCP | Shared memory, code understanding |
| Context7 MCP | Documentation lookup |
| Tavily MCP | Web search for research |
| Sequential Thinking MCP | Step-by-step reasoning for complex problems |
Telemetry: Install Phoenix to watch agent steps and reviews at localhost:6006.
| Document | Description |
|---|---|
| Installation | Detailed setup guide |
| Usage | Commands, workflows, examples |
| Reference | Complete config reference |
| Architecture | System design |
git clone https://github.com/calinfaja/K-LEAN.git
cd k-lean
pipx install -e .
kln install --dev
kln admin testSee CONTRIBUTING.md for guidelines.
Apache 2.0 — See LICENSE
Get second opinions. Ship with confidence.
