This document serves as the technical blueprint for Slipstream, a high-performance semantic LLM router.
Minimize inference costs and optimize developer experience by dynamically routing LLM requests between DeepSeek-V3.2 (Efficiency/Weak) and Claude Opus 4.6 (Reasoning/Strong) based on real-time prompt complexity analysis.
- Infrastructure: AWS Bedrock (Serverless Inference), Terraform (IaC).
- Runtime: Rust (Axum + Tokio) for high-concurrency, low-latency proxying.
- Environment: Nix (Hermetic builds), Just (Task orchestration).
- Security: AWS SigV4 via IAM roles (Least Privilege), Bedrock Guardrails.
- Observability: OpenTelemetry (OTel) for cost and latency attribution.
- Ingress: Slipstream listens on
:3000for OpenAI-compatible JSON payloads. - Preprocessing:
- Prefix Extraction: Extract the first 1,000 tokens of the prompt.
- Speculative Bypass: If prompt < 50 tokens, default to Weak model (bypass classifier).
- Classification:
- Asynchronous call to DeepSeek-Lite or Nano-model.
- Prompt:
Complexity [0: Routine, 1: Complex]. Input: {prefix}. - Strict Timeout: 500ms limit on classification to prevent UX lag.
- Dispatch:
- 0 (Routine): Route to DeepSeek-V3.2 via Bedrock.
- 1 (Complex): Route to Claude Opus 4.6 via Bedrock.
- Egress: Stream Server-Sent Events (SSE) back to TUI/CLI without buffering.
- Implement
axumserver with/v1/chat/completionsendpoint. - Hard-route all traffic to DeepSeek-V3.2 via AWS SDK.
- Success Metric: TUI receives a streamed response from Bedrock through Slipstream.
- Implement the Classification logic.
- Handle model selection logic (Weak vs. Strong).
- Implement Request Hedging: Failover to Weak model if Strong model returns 429/500.
- Integrate
tracing-opentelemetry. - Calculate and export metrics:
latency_ms,tokens_consumed,cost_usd, androuting_decision. - Build a
Justrecipe to visualize cost savings.
- Latency Tax: Routing decision + classification must remain < 15% of total request time.
- Memory Footprint: Use Rust's zero-copy parsing (
serde) to keep container size < 50MB. - Signature Integrity: Ensure SigV4 signing remains valid across region-specific Bedrock endpoints.
Status: Infrastructure Verified (Terraform Plan ✅). Environment Ready (Nix ✅). Next Action: Implement Phase 1 (The Wire) in src/main.rs.