-
-
Notifications
You must be signed in to change notification settings - Fork 6.6k
Description
Description:
The async_filter_deployments method is an extremely useful hook in the CustomLogger class for filtering deployments before routing, but it's completely undocumented. This makes it difficult for users to discover and implement custom deployment filtering logic.
Current Situation
The hook exists and is actively used in the router code (litellm/router.py around line ~4835), but:
❌ Not mentioned in the documentation
❌ Not in any examples
❌ Not in the API reference
❌ Difficult to discover without reading source code
Use Case
We're using async_filter_deployments to implement dynamic region failover in our LLM proxy. When a region experiences issues, we disable it in our database, and this hook filters out disabled regions from the healthy deployments list before routing occurs.
Example implementation:
from litellm.integrations.custom_logger import CustomLogger
from typing import List, Optional, Any
class MyCustomHandler(CustomLogger):
async def async_filter_deployments(
self,
model: str,
healthy_deployments: List,
messages: Optional[List],
request_kwargs: Optional[dict] = None,
parent_otel_span: Optional[Any] = None,
) -> List[dict]:
"""
Filter deployments based on custom logic.
Called after healthy deployments are determined but before routing.
"""
# Example: Filter out deployments in disabled regions
disabled_regions = await get_disabled_regions(model)
filtered = [
deployment for deployment in healthy_deployments
if deployment.get("model_info", {}).get("location") not in disabled_regions
]
return filtered
Benefits of this hook
✅ Pre-routing filtering based on custom business logic
✅ Dynamic deployment selection (e.g., region failover, cost optimization)
✅ Integration with external systems (databases, feature flags)
✅ Called for every request, enabling real-time filtering
Please add documentation covering:
- When it's called: After health checks but before deployment selection
- Parameters:
model: The model name being requested
healthy_deployments: List of deployments that passed health checks
messages: Request messages (for context-aware filtering)
request_kwargs: Additional request parameters
parent_otel_span: OpenTelemetry span for tracing - Return value: Filtered list of deployments to consider for routing
- Use cases: Region failover, cost optimization, A/B testing, compliance filtering
- Example implementation: Show common patterns
This would help users leverage LiteLLM's powerful routing capabilities for production use cases.