Skip to content

Commit ffff970

Browse files
committed
feat(router): add LLM routing with cost optimization and pretrained configs
Extend SemanticRouter to support LLM model selection by adding optional model, confidence, cost optimization, and multi-match capabilities to the existing routing infrastructure. When a Route includes a `model` field, the router returns the LiteLLM- compatible model identifier alongside the match, with a confidence score derived from vector distance. Cost-optimized routing biases toward cheaper models when semantic distances are close, using a configurable cost_weight penalty. Key additions to SemanticRouter: - Route.model (optional) for LiteLLM model identifiers - RouteMatch.confidence, .alternatives, .metadata fields - RoutingConfig.cost_optimization and .cost_weight settings - RoutingConfig.default_route for fallback when no match found - from_pretrained() to load routers with pre-computed embeddings - export_with_embeddings() to serialize routers with vectors - AsyncSemanticRouter with full async parity A built-in "default" pretrained config ships with 3 tiers (simple, standard, expert) mapped to GPT-4.1 Nano, Claude Sonnet 4.5, and Claude Opus 4.5, using pre-computed sentence-transformers embeddings. Backward compatibility: - LLMRouter/AsyncLLMRouter provided as deprecated wrappers - ModelTier subclass enforces required model field - Legacy field names (tiers/default_tier) mapped bidirectionally - Existing SemanticRouter usage is fully unaffected Includes integration tests, unit tests for schema validation, a user guide notebook, and a pretrained config generation script.
1 parent 5601ef0 commit ffff970

File tree

21 files changed

+49107
-48
lines changed

21 files changed

+49107
-48
lines changed

docs/user_guide/13_llm_router.ipynb

Lines changed: 1571 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 362 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,362 @@
1+
# LLM Router Extension - Design Document
2+
3+
## Overview
4+
5+
The LLM Router is an extension to RedisVL that provides intelligent, cost-optimized LLM model selection using semantic routing. Instead of routing queries to topics (like SemanticRouter), it routes queries to **model tiers** - selecting the cheapest LLM capable of handling each task.
6+
7+
## Problem Statement
8+
9+
### The LLM Cost Problem
10+
Modern applications often default to using the most capable (and expensive) LLM for all queries, even when simpler models would suffice:
11+
- "Hello, how are you?" -> Claude Opus 4.5 ($5/M tokens)
12+
- "Hello, how are you?" -> GPT-4.1 Nano ($0.10/M tokens)
13+
14+
### Existing Solutions and Their Limitations
15+
16+
**RouteLLM** (CMU/LMSys):
17+
- Binary classification only (strong vs weak model)
18+
- No support for >2 tiers
19+
- Requires training data or preference matrices
20+
21+
**NVIDIA LLM Router Blueprint**:
22+
- Complexity classification approach (simple/moderate/complex)
23+
- Provides the taxonomy basis but no open-source Redis-native implementation
24+
25+
**RouterArena / Bloom's Taxonomy Approach**:
26+
- Maps query complexity to Bloom's cognitive levels
27+
- Informs our tier design but lacks production routing infrastructure
28+
29+
**OpenRouter Auto-Router**:
30+
- Black box routing decisions
31+
- Data flows through third-party servers
32+
- No transparency into why a model was selected
33+
- Can't self-host or customize
34+
35+
**NotDiamond**:
36+
- Proprietary ML model for routing
37+
- Requires API calls for every routing decision
38+
- No local/offline capability
39+
40+
**FrugalGPT**:
41+
- Sequential cascade approach (try cheap first, escalate)
42+
- Higher latency due to serial model calls
43+
44+
## Solution: Semantic Model Tier Routing
45+
46+
Repurpose RedisVL's battle-tested SemanticRouter for model selection:
47+
48+
```
49+
SemanticRouter -> LLMRouter
50+
-----------------------------------------
51+
Route -> ModelTier
52+
route.name -> tier.name (simple/standard/expert)
53+
route.references -> tier.references (task complexity examples)
54+
route.metadata -> tier.metadata (cost, capabilities)
55+
RouteMatch -> LLMRouteMatch (includes model string)
56+
```
57+
58+
### Architecture
59+
60+
```
61+
+---------------------------------------------------------------+
62+
| LLMRouter |
63+
+---------------------------------------------------------------+
64+
| +-------------+ +-------------+ +-------------+ |
65+
| | Simple | | Standard | | Expert | |
66+
| | Tier | | Tier | | Tier | |
67+
| +-------------+ +-------------+ +-------------+ |
68+
| | gpt-4.1-nano| | sonnet 4.5 | | opus 4.5 | |
69+
| | $0.10/M | | $3/M | | $5/M | |
70+
| | threshold: | | threshold: | | threshold: | |
71+
| | 0.5 | | 0.6 | | 0.7 | |
72+
| +-------------+ +-------------+ +-------------+ |
73+
| | | | |
74+
| +----------------+----------------+ |
75+
| v |
76+
| +------------------------+ |
77+
| | Redis Vector Index | |
78+
| | (reference phrases) | |
79+
| +------------------------+ |
80+
+---------------------------------------------------------------+
81+
|
82+
v
83+
+-------------+
84+
| Query |
85+
| "analyze |
86+
| this..." |
87+
+-------------+
88+
|
89+
v
90+
+-------------+
91+
| LiteLLM |
92+
| (optional) |
93+
+-------------+
94+
```
95+
96+
## Key Design Decisions
97+
98+
### 1. Model Tiers, Not Individual Models
99+
100+
Routes map to **tiers** (simple, standard, expert) rather than specific models. This provides:
101+
- Abstraction from model churn (swap haiku -> gemini-flash without changing routes)
102+
- Clear mental model for users
103+
- Easy cost optimization within tiers
104+
105+
### 2. Bloom's Taxonomy-Grounded Tiers
106+
107+
The default pretrained config maps tiers to Bloom's Taxonomy cognitive levels:
108+
- **Simple** (Remember/Understand): Factual recall, greetings, format conversion
109+
- **Standard** (Apply/Analyze): Code explanation, summarization, moderate analysis
110+
- **Expert** (Evaluate/Create): Research, architecture, formal reasoning
111+
112+
This is informed by RouterArena's finding that cognitive complexity correlates with model capability requirements.
113+
114+
### 3. LiteLLM-Compatible Model Strings
115+
116+
Tier model identifiers use LiteLLM format (`provider/model`):
117+
```python
118+
ModelTier(
119+
name="standard",
120+
model="anthropic/claude-sonnet-4-5", # Works directly with LiteLLM
121+
...
122+
)
123+
```
124+
125+
### 4. Per-Tier Distance Thresholds
126+
127+
Each tier has its own `distance_threshold`, allowing fine-grained control:
128+
```python
129+
simple_tier = ModelTier(..., distance_threshold=0.5) # Strict match
130+
expert_tier = ModelTier(..., distance_threshold=0.7) # Looser match
131+
```
132+
133+
### 5. Cost-Aware Routing
134+
135+
When `cost_optimization=True`, the router adds a cost penalty to distances:
136+
```python
137+
adjusted_distance = distance + (cost_per_1k * cost_weight)
138+
```
139+
This prefers cheaper tiers when semantic distances are close.
140+
141+
### 6. Pretrained Configs with Embedded Vectors
142+
143+
The built-in `default.json` provides a ready-to-use 3-tier configuration:
144+
```python
145+
# Instant setup - no embedding model needed at load time
146+
router = LLMRouter.from_pretrained("default", redis_client=client)
147+
```
148+
149+
The pretrained config includes pre-computed embeddings from
150+
`sentence-transformers/all-mpnet-base-v2`, with 18 reference phrases per tier
151+
covering the Bloom's Taxonomy spectrum.
152+
153+
Custom configs can also be exported and shared:
154+
```python
155+
# Export (one-time, with embedding model)
156+
router.export_with_embeddings("my_router.json")
157+
158+
# Import (no embedding needed)
159+
router = LLMRouter.from_pretrained("my_router.json", redis_client=client)
160+
```
161+
162+
### 7. Async Support
163+
164+
`AsyncLLMRouter` provides the same functionality using async I/O. Since
165+
`__init__` cannot be async, it uses a `create()` classmethod factory:
166+
167+
```python
168+
router = await AsyncLLMRouter.create(
169+
name="my-router",
170+
tiers=tiers,
171+
redis_client=async_client,
172+
)
173+
match = await router.route("hello")
174+
```
175+
176+
Key async method mapping:
177+
178+
| Sync (`LLMRouter`) | Async (`AsyncLLMRouter`) |
179+
|---------------------|--------------------------|
180+
| `__init__()` | `await create()` |
181+
| `from_existing()` | `await from_existing()` |
182+
| `route()` | `await route()` |
183+
| `route_many()` | `await route_many()` |
184+
| `add_tier()` | `await add_tier()` |
185+
| `remove_tier()` | `await remove_tier()` |
186+
| `from_dict()` | `await from_dict()` |
187+
| `from_pretrained()` | `await from_pretrained()` |
188+
| `delete()` | `await delete()` |
189+
190+
## Module Structure
191+
192+
```
193+
redisvl/extensions/llm_router/
194+
+-- __init__.py # Public exports (LLMRouter, AsyncLLMRouter, schemas)
195+
+-- DESIGN.md # This document
196+
+-- schema.py # Pydantic models
197+
| +-- ModelTier # Tier definition
198+
| +-- LLMRouteMatch # Routing result
199+
| +-- RoutingConfig # Router configuration
200+
| +-- Pretrained* # Export/import schemas
201+
+-- router.py # LLMRouter + AsyncLLMRouter implementations
202+
+-- pretrained/
203+
+-- __init__.py # Pretrained loader (get_pretrained_path)
204+
+-- default.json # Standard 3-tier config (simple/standard/expert)
205+
```
206+
207+
## API Examples
208+
209+
### Basic Usage
210+
211+
```python
212+
from redisvl.extensions.llm_router import LLMRouter, ModelTier
213+
214+
tiers = [
215+
ModelTier(
216+
name="simple",
217+
model="openai/gpt-4.1-nano",
218+
references=[
219+
"hello", "hi there", "thanks", "goodbye",
220+
"what time is it?", "how are you?",
221+
],
222+
metadata={"cost_per_1k_input": 0.0001},
223+
distance_threshold=0.5,
224+
),
225+
ModelTier(
226+
name="standard",
227+
model="anthropic/claude-sonnet-4-5",
228+
references=[
229+
"analyze this code for bugs",
230+
"explain how neural networks learn",
231+
"compare and contrast these approaches",
232+
],
233+
metadata={"cost_per_1k_input": 0.003},
234+
distance_threshold=0.6,
235+
),
236+
ModelTier(
237+
name="expert",
238+
model="anthropic/claude-opus-4-5",
239+
references=[
240+
"prove this mathematical theorem",
241+
"architect a distributed system",
242+
"write a research paper analyzing",
243+
],
244+
metadata={"cost_per_1k_input": 0.005},
245+
distance_threshold=0.7,
246+
),
247+
]
248+
249+
router = LLMRouter(
250+
name="my-llm-router",
251+
tiers=tiers,
252+
redis_url="redis://localhost:6379",
253+
)
254+
255+
# Route a query
256+
match = router.route("hello, how's it going?")
257+
print(match.tier) # "simple"
258+
print(match.model) # "openai/gpt-4.1-nano"
259+
260+
# Use with LiteLLM (optional integration)
261+
from litellm import completion
262+
response = completion(model=match.model, messages=[{"role": "user", "content": query}])
263+
```
264+
265+
### Cost-Optimized Routing
266+
267+
```python
268+
router = LLMRouter(
269+
name="cost-aware-router",
270+
tiers=tiers,
271+
cost_optimization=True, # Prefer cheaper tiers when distances are close
272+
redis_url="redis://localhost:6379",
273+
)
274+
```
275+
276+
### Pretrained Router
277+
278+
```python
279+
# Load without needing an embedding model for the references
280+
router = LLMRouter.from_pretrained(
281+
"default", # Built-in config, or path to JSON
282+
redis_client=client,
283+
)
284+
```
285+
286+
### Async Usage
287+
288+
```python
289+
from redisvl.extensions.llm_router import AsyncLLMRouter
290+
291+
router = await AsyncLLMRouter.create(
292+
name="my-async-router",
293+
tiers=tiers,
294+
redis_url="redis://localhost:6379",
295+
)
296+
297+
match = await router.route("explain how garbage collection works")
298+
print(match.model) # "anthropic/claude-sonnet-4-5"
299+
300+
# Or load from pretrained
301+
router = await AsyncLLMRouter.from_pretrained("default", redis_client=client)
302+
303+
await router.delete()
304+
```
305+
306+
## Comparison with SemanticRouter
307+
308+
| Feature | SemanticRouter | LLMRouter |
309+
|---------|---------------|-----------|
310+
| Purpose | Topic classification | Model selection |
311+
| Output | Route name | Model string + metadata |
312+
| Cost awareness | No | Yes |
313+
| Pretrained configs | No | Yes |
314+
| Per-route thresholds | Yes | Yes |
315+
| LiteLLM integration | No | Yes (model strings) |
316+
| Async support | No | Yes (`AsyncLLMRouter`) |
317+
318+
## Testing
319+
320+
```bash
321+
uv run pytest tests/unit/test_llm_router_schema.py -v
322+
uv run pytest tests/integration/test_llm_router.py -v
323+
uv run pytest tests/integration/test_async_llm_router.py -v
324+
```
325+
326+
## Future Enhancements
327+
328+
### 1. `complete()` Method
329+
Direct LiteLLM integration for one-liner usage:
330+
```python
331+
response = router.complete("analyze this code", messages=[...])
332+
```
333+
334+
### 2. Capability Filtering
335+
Filter tiers by capability before routing:
336+
```python
337+
match = router.route("generate an image", capabilities=["vision"])
338+
```
339+
340+
### 3. Budget Constraints
341+
Enforce cost limits:
342+
```python
343+
router = LLMRouter(..., max_cost_per_1k=0.01) # Never select opus
344+
```
345+
346+
### 4. Fallback Chains
347+
Define fallback order when primary tier unavailable:
348+
```python
349+
tier = ModelTier(..., fallback=["standard", "simple"])
350+
```
351+
352+
## References
353+
354+
- [RedisVL SemanticRouter](https://docs.redisvl.com/en/latest/user_guide/semantic_router.html)
355+
- [LiteLLM Model List](https://docs.litellm.ai/docs/providers)
356+
- [RouteLLM](https://github.com/lm-sys/RouteLLM) - LMSys binary router framework
357+
- [NVIDIA LLM Router Blueprint](https://build.nvidia.com/blueprints/llm-router) - Complexity-based routing
358+
- [RouterArena / Bloom's Taxonomy](https://arxiv.org/abs/2412.06644) - Cognitive complexity for routing
359+
- [FrugalGPT](https://arxiv.org/abs/2305.05176) - Cost-efficient LLM strategies
360+
- [OpenRouter](https://openrouter.ai/) - Auto-routing concept
361+
- [NotDiamond](https://notdiamond.ai/) - ML-based model routing
362+
- [Unify.ai](https://unify.ai/) - Quality-cost tradeoff routing

0 commit comments

Comments
 (0)