Skip to content

Commit 274d029

Browse files
committed
update config & update docker compose README
Signed-off-by: JaredforReal <[email protected]>
1 parent 78ab285 commit 274d029

File tree

3 files changed

+221
-51
lines changed

3 files changed

+221
-51
lines changed

deploy/docker-compose/README.md

Lines changed: 66 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,9 @@ This directory contains the primary `docker-compose.yml` used to run the Semanti
44

55
- Envoy proxy (ExtProc integration)
66
- Semantic Router (extproc)
7-
- Observability (Prometheus + Grafana)
7+
- Observability (Prometheus + Grafana + Jaeger)
88
- Dashboard (unified UI: config, monitoring, topology, playground)
9+
- Chat UI (Hugging Face Chat UI with MongoDB)
910
- Open WebUI + Pipelines (for the Playground tab)
1011
- Optional test services (mock-vllm, llm-katan via profiles)
1112

@@ -26,11 +27,14 @@ Example mappings:
2627
- `semantic-router` (port: 50051 for gRPC ExtProc; has internal health on 8080)
2728
- `prometheus` (port: 9090)
2829
- `grafana` (port: 3000)
30+
- `jaeger` (ports: 4318, 16686)
31+
- `chat-ui` (port: 3002 → 3000 in-container)
32+
- `mongo` (no host port by default)
2933
- `openwebui` (port: 3001 → 8080 in-container)
3034
- `pipelines` (no host port by default)
3135
- `dashboard` (port: 8700)
3236
- `mock-vllm` (port: 8000; profile: testing)
33-
- `llm-katan` (port: 8002 → 8000; profiles: testing, llm-katan)
37+
- `llm-katan` (port: 8002; profiles: testing, llm-katan)
3438

3539
## Profiles
3640

@@ -46,6 +50,8 @@ These host ports are exposed when you bring the stack up:
4650
- Envoy admin: http://localhost:19000
4751
- Grafana: http://localhost:3000 (admin/admin)
4852
- Prometheus: http://localhost:9090
53+
- Jaeger: http://localhost:16686 (tracing UI)
54+
- Chat UI: http://localhost:3002 (Hugging Face Chat UI)
4955
- Open WebUI: http://localhost:3001
5056
- Mock vLLM (testing profile): http://localhost:8000
5157
- LLM Katan (testing/llm-katan profiles): http://localhost:8002
@@ -107,15 +113,17 @@ The `dashboard` service exposes a unified UI at http://localhost:8700 with:
107113
- Monitoring: iframe embed of Grafana
108114
- Config: `GET /api/router/config/all` and `POST /api/router/config/update` mapped to `/app/config/config.yaml`
109115
- Topology: visualizes routing/config
110-
- Playground: iframe embed of Open WebUI
116+
- Playground: iframe embed of Open WebUI and Chat UI
111117

112118
Environment variables set in Compose:
113119

114120
- `TARGET_GRAFANA_URL=http://grafana:3000`
115121
- `TARGET_PROMETHEUS_URL=http://prometheus:9090`
122+
- `TARGET_JAEGER_URL=http://jaeger:16686`
116123
- `TARGET_ROUTER_API_URL=http://semantic-router:8080`
117124
- `TARGET_ROUTER_METRICS_URL=http://semantic-router:9190/metrics`
118125
- `TARGET_OPENWEBUI_URL=http://openwebui:8080`
126+
- `TARGET_CHATUI_URL=http://chat-ui:3000`
119127
- `ROUTER_CONFIG_PATH=/app/config/config.yaml`
120128

121129
Volumes:
@@ -126,11 +134,66 @@ Image selection:
126134

127135
- Uses `DASHBOARD_IMAGE` if provided; otherwise builds from `dashboard/backend/Dockerfile` at `docker compose up` time.
128136

137+
## Chat UI (Hugging Face)
138+
139+
The `chat-ui` service provides a modern chat interface using Hugging Face's Chat UI:
140+
141+
- **URL**: http://localhost:3002
142+
- **Database**: MongoDB for conversation persistence
143+
- **API Integration**: Routes through Envoy proxy for OpenAI-compatible API calls
144+
- **Configuration**:
145+
- `OPENAI_BASE_URL=http://envoy-proxy:8801/v1` (routes through Envoy)
146+
- `OPENAI_API_KEY` (configurable via environment variable)
147+
- `MONGODB_URL=mongodb://mongo:27017` (local MongoDB by default)
148+
149+
### Environment Variables
150+
151+
You can customize Chat UI behavior by setting these environment variables:
152+
153+
```bash
154+
# API Configuration
155+
export OPENAI_API_KEY="your-api-key-here"
156+
export MONGODB_URL="mongodb://mongo:27017" # or Atlas URL for production
157+
export MONGODB_DB_NAME="chat-ui"
158+
159+
# UI Customization
160+
export PUBLIC_APP_NAME="HuggingChat"
161+
export PUBLIC_APP_ASSETS="chatui"
162+
export LOG_LEVEL="info"
163+
```
164+
129165
## Open WebUI + Pipelines
130166

131167
- `openwebui` is exposed at http://localhost:3001 (proxied via the Dashboard too)
132168
- `pipelines` mounts `./addons/vllm_semantic_router_pipe.py` into `/app/pipelines/` for easy integration
133169

170+
## Observability Stack
171+
172+
The stack includes a complete observability solution:
173+
174+
### Prometheus
175+
176+
- **URL**: http://localhost:9090
177+
- **Configuration**: `./addons/prometheus.yaml`
178+
- **Data Retention**: 15 days
179+
- **Storage**: Persistent volume `prometheus-data`
180+
181+
### Grafana
182+
183+
- **URL**: http://localhost:3000
184+
- **Credentials**: admin/admin
185+
- **Configuration**:
186+
- Datasources: Prometheus and Jaeger
187+
- Dashboard: LLM Router dashboard
188+
- Storage: Persistent volume `grafana-data`
189+
190+
### Jaeger (Distributed Tracing)
191+
192+
- **URL**: http://localhost:16686
193+
- **OTLP Endpoint**: http://localhost:4318 (gRPC)
194+
- **Configuration**: OTLP collector enabled
195+
- **Integration**: Semantic Router sends traces via OTLP
196+
134197
## Networking
135198

136199
All services join the `semantic-network` bridge network with a fixed subnet to make in-network lookups stable. Host-published ports are listed above under Services & Ports.

deploy/kubernetes/config.yaml

Lines changed: 82 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -5,11 +5,11 @@ bert_model:
55

66
semantic_cache:
77
enabled: true
8-
backend_type: "memory" # Options: "memory" or "milvus"
8+
backend_type: "memory" # Options: "memory" or "milvus"
99
similarity_threshold: 0.8
10-
max_entries: 1000 # Only applies to memory backend
10+
max_entries: 1000 # Only applies to memory backend
1111
ttl_seconds: 3600
12-
eviction_policy: "fifo"
12+
eviction_policy: "fifo"
1313

1414
tools:
1515
enabled: true
@@ -19,7 +19,7 @@ tools:
1919
fallback_to_empty: true
2020

2121
prompt_guard:
22-
enabled: true
22+
enabled: true # Global default - can be overridden per category with jailbreak_enabled
2323
use_modernbert: true
2424
model_id: "models/jailbreak_classifier_modernbert-base_model"
2525
threshold: 0.7
@@ -32,13 +32,13 @@ prompt_guard:
3232
# NOT supported: domain names (example.com), protocol prefixes (http://), paths (/api), ports in address (use 'port' field)
3333
vllm_endpoints:
3434
- name: "endpoint1"
35-
address: "127.0.0.1" # IPv4 address - REQUIRED format
36-
port: 8000
35+
address: "172.28.0.20" # Static IPv4 of llm-katan within docker compose network
36+
port: 8002
3737
weight: 1
3838

3939
model_config:
40-
"openai/gpt-oss-20b":
41-
reasoning_family: "gpt-oss" # This model uses GPT-OSS reasoning syntax
40+
"qwen3":
41+
reasoning_family: "qwen3" # This model uses Qwen-3 reasoning syntax
4242
preferred_endpoints: ["endpoint1"]
4343
pii_policy:
4444
allow_by_default: true
@@ -61,77 +61,113 @@ classifier:
6161
# Categories with new use_reasoning field structure
6262
categories:
6363
- name: business
64+
system_prompt: "You are a senior business consultant and strategic advisor with expertise in corporate strategy, operations management, financial analysis, marketing, and organizational development. Provide practical, actionable business advice backed by proven methodologies and industry best practices. Consider market dynamics, competitive landscape, and stakeholder interests in your recommendations."
65+
# jailbreak_enabled: true # Optional: Override global jailbreak detection per category
66+
# jailbreak_threshold: 0.8 # Optional: Override global jailbreak threshold per category
6467
model_scores:
65-
- model: openai/gpt-oss-20b
68+
- model: qwen3
6669
score: 0.7
67-
use_reasoning: false # Business performs better without reasoning
70+
use_reasoning: false # Business performs better without reasoning
6871
- name: law
72+
system_prompt: "You are a knowledgeable legal expert with comprehensive understanding of legal principles, case law, statutory interpretation, and legal procedures across multiple jurisdictions. Provide accurate legal information and analysis while clearly stating that your responses are for informational purposes only and do not constitute legal advice. Always recommend consulting with qualified legal professionals for specific legal matters."
6973
model_scores:
70-
- model: openai/gpt-oss-20b
74+
- model: qwen3
7175
score: 0.4
7276
use_reasoning: false
7377
- name: psychology
78+
system_prompt: "You are a psychology expert with deep knowledge of cognitive processes, behavioral patterns, mental health, developmental psychology, social psychology, and therapeutic approaches. Provide evidence-based insights grounded in psychological research and theory. When discussing mental health topics, emphasize the importance of professional consultation and avoid providing diagnostic or therapeutic advice."
79+
semantic_cache_enabled: true
80+
semantic_cache_similarity_threshold: 0.92 # High threshold for psychology - sensitive to nuances
7481
model_scores:
75-
- model: openai/gpt-oss-20b
82+
- model: qwen3
7683
score: 0.6
7784
use_reasoning: false
7885
- name: biology
86+
system_prompt: "You are a biology expert with comprehensive knowledge spanning molecular biology, genetics, cell biology, ecology, evolution, anatomy, physiology, and biotechnology. Explain biological concepts with scientific accuracy, use appropriate terminology, and provide examples from current research. Connect biological principles to real-world applications and emphasize the interconnectedness of biological systems."
7987
model_scores:
80-
- model: openai/gpt-oss-20b
88+
- model: qwen3
8189
score: 0.9
8290
use_reasoning: false
8391
- name: chemistry
92+
system_prompt: "You are a chemistry expert specializing in chemical reactions, molecular structures, and laboratory techniques. Provide detailed, step-by-step explanations."
8493
model_scores:
85-
- model: openai/gpt-oss-20b
94+
- model: qwen3
8695
score: 0.6
87-
use_reasoning: true # Enable reasoning for complex chemistry
96+
use_reasoning: true # Enable reasoning for complex chemistry
8897
- name: history
98+
system_prompt: "You are a historian with expertise across different time periods and cultures. Provide accurate historical context and analysis."
8999
model_scores:
90-
- model: openai/gpt-oss-20b
100+
- model: qwen3
91101
score: 0.7
92102
use_reasoning: false
93103
- name: other
104+
system_prompt: "You are a helpful and knowledgeable assistant. Provide accurate, helpful responses across a wide range of topics."
105+
semantic_cache_enabled: true
106+
semantic_cache_similarity_threshold: 0.75 # Lower threshold for general chat - less sensitive
94107
model_scores:
95-
- model: openai/gpt-oss-20b
108+
- model: qwen3
96109
score: 0.7
97110
use_reasoning: false
98111
- name: health
112+
system_prompt: "You are a health and medical information expert with knowledge of anatomy, physiology, diseases, treatments, preventive care, nutrition, and wellness. Provide accurate, evidence-based health information while emphasizing that your responses are for educational purposes only and should never replace professional medical advice, diagnosis, or treatment. Always encourage users to consult healthcare professionals for medical concerns and emergencies."
113+
semantic_cache_enabled: true
114+
semantic_cache_similarity_threshold: 0.95 # High threshold for health - very sensitive to word changes
99115
model_scores:
100-
- model: openai/gpt-oss-20b
116+
- model: qwen3
101117
score: 0.5
102118
use_reasoning: false
103119
- name: economics
120+
system_prompt: "You are an economics expert with deep understanding of microeconomics, macroeconomics, econometrics, financial markets, monetary policy, fiscal policy, international trade, and economic theory. Analyze economic phenomena using established economic principles, provide data-driven insights, and explain complex economic concepts in accessible terms. Consider both theoretical frameworks and real-world applications in your responses."
104121
model_scores:
105-
- model: openai/gpt-oss-20b
122+
- model: qwen3
106123
score: 1.0
107124
use_reasoning: false
108125
- name: math
126+
system_prompt: "You are a mathematics expert. Provide step-by-step solutions, show your work clearly, and explain mathematical concepts in an understandable way."
109127
model_scores:
110-
- model: openai/gpt-oss-20b
128+
- model: qwen3
111129
score: 1.0
112-
use_reasoning: true # Enable reasoning for complex math
130+
use_reasoning: true # Enable reasoning for complex math
113131
- name: physics
132+
system_prompt: "You are a physics expert with deep understanding of physical laws and phenomena. Provide clear explanations with mathematical derivations when appropriate."
114133
model_scores:
115-
- model: openai/gpt-oss-20b
134+
- model: qwen3
116135
score: 0.7
117-
use_reasoning: true # Enable reasoning for physics
136+
use_reasoning: true # Enable reasoning for physics
118137
- name: computer science
138+
system_prompt: "You are a computer science expert with knowledge of algorithms, data structures, programming languages, and software engineering. Provide clear, practical solutions with code examples when helpful."
119139
model_scores:
120-
- model: openai/gpt-oss-20b
140+
- model: qwen3
121141
score: 0.6
122142
use_reasoning: false
123143
- name: philosophy
144+
system_prompt: "You are a philosophy expert with comprehensive knowledge of philosophical traditions, ethical theories, logic, metaphysics, epistemology, political philosophy, and the history of philosophical thought. Engage with complex philosophical questions by presenting multiple perspectives, analyzing arguments rigorously, and encouraging critical thinking. Draw connections between philosophical concepts and contemporary issues while maintaining intellectual honesty about the complexity and ongoing nature of philosophical debates."
124145
model_scores:
125-
- model: openai/gpt-oss-20b
146+
- model: qwen3
126147
score: 0.5
127148
use_reasoning: false
128149
- name: engineering
150+
system_prompt: "You are an engineering expert with knowledge across multiple engineering disciplines including mechanical, electrical, civil, chemical, software, and systems engineering. Apply engineering principles, design methodologies, and problem-solving approaches to provide practical solutions. Consider safety, efficiency, sustainability, and cost-effectiveness in your recommendations. Use technical precision while explaining concepts clearly, and emphasize the importance of proper engineering practices and standards."
129151
model_scores:
130-
- model: openai/gpt-oss-20b
152+
- model: qwen3
131153
score: 0.7
132154
use_reasoning: false
133155

134-
default_model: openai/gpt-oss-20b
156+
default_model: "qwen3"
157+
158+
# Auto model name for automatic model selection (optional)
159+
# This is the model name that clients should use to trigger automatic model selection
160+
# If not specified, defaults to "MoM" (Mixture of Models)
161+
# For backward compatibility, "auto" is always accepted as an alias
162+
# Example: auto_model_name: "MoM" # or any other name you prefer
163+
# auto_model_name: "MoM"
164+
165+
# Include configured models in /v1/models list endpoint (optional, default: false)
166+
# When false (default): only the auto model name is returned in the /v1/models endpoint
167+
# When true: all models configured in model_config are also included in the /v1/models endpoint
168+
# This is useful for clients that need to discover all available models
169+
# Example: include_config_models_in_list: true
170+
# include_config_models_in_list: false
135171

136172
# Reasoning family configurations
137173
reasoning_families:
@@ -164,5 +200,23 @@ api:
164200
detailed_goroutine_tracking: true
165201
high_resolution_timing: false
166202
sample_rate: 1.0
167-
duration_buckets: [0.001, 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10, 30]
203+
duration_buckets:
204+
[0.001, 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10, 30]
168205
size_buckets: [1, 2, 5, 10, 20, 50, 100, 200]
206+
207+
# Observability Configuration
208+
observability:
209+
tracing:
210+
enabled: true # Enable distributed tracing for docker-compose stack
211+
provider: "opentelemetry" # Provider: opentelemetry, openinference, openllmetry
212+
exporter:
213+
type: "otlp" # Export spans to Jaeger (via OTLP gRPC)
214+
endpoint: "jaeger:4317" # Jaeger collector inside compose network
215+
insecure: true # Use insecure connection (no TLS)
216+
sampling:
217+
type: "always_on" # Sampling: always_on, always_off, probabilistic
218+
rate: 1.0 # Sampling rate for probabilistic (0.0-1.0)
219+
resource:
220+
service_name: "vllm-semantic-router"
221+
service_version: "v0.1.0"
222+
deployment_environment: "development"

0 commit comments

Comments
 (0)