You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Uses `DASHBOARD_IMAGE` if provided; otherwise builds from `dashboard/backend/Dockerfile` at `docker compose up` time.
128
136
137
+
## Chat UI (Hugging Face)
138
+
139
+
The `chat-ui` service provides a modern chat interface using Hugging Face's Chat UI:
140
+
141
+
-**URL**: http://localhost:3002
142
+
-**Database**: MongoDB for conversation persistence
143
+
-**API Integration**: Routes through Envoy proxy for OpenAI-compatible API calls
144
+
-**Configuration**:
145
+
-`OPENAI_BASE_URL=http://envoy-proxy:8801/v1` (routes through Envoy)
146
+
-`OPENAI_API_KEY` (configurable via environment variable)
147
+
-`MONGODB_URL=mongodb://mongo:27017` (local MongoDB by default)
148
+
149
+
### Environment Variables
150
+
151
+
You can customize Chat UI behavior by setting these environment variables:
152
+
153
+
```bash
154
+
# API Configuration
155
+
export OPENAI_API_KEY="your-api-key-here"
156
+
export MONGODB_URL="mongodb://mongo:27017"# or Atlas URL for production
157
+
export MONGODB_DB_NAME="chat-ui"
158
+
159
+
# UI Customization
160
+
export PUBLIC_APP_NAME="HuggingChat"
161
+
export PUBLIC_APP_ASSETS="chatui"
162
+
export LOG_LEVEL="info"
163
+
```
164
+
129
165
## Open WebUI + Pipelines
130
166
131
167
-`openwebui` is exposed at http://localhost:3001 (proxied via the Dashboard too)
132
168
-`pipelines` mounts `./addons/vllm_semantic_router_pipe.py` into `/app/pipelines/` for easy integration
133
169
170
+
## Observability Stack
171
+
172
+
The stack includes a complete observability solution:
173
+
174
+
### Prometheus
175
+
176
+
-**URL**: http://localhost:9090
177
+
-**Configuration**: `./addons/prometheus.yaml`
178
+
-**Data Retention**: 15 days
179
+
-**Storage**: Persistent volume `prometheus-data`
180
+
181
+
### Grafana
182
+
183
+
-**URL**: http://localhost:3000
184
+
-**Credentials**: admin/admin
185
+
-**Configuration**:
186
+
- Datasources: Prometheus and Jaeger
187
+
- Dashboard: LLM Router dashboard
188
+
- Storage: Persistent volume `grafana-data`
189
+
190
+
### Jaeger (Distributed Tracing)
191
+
192
+
-**URL**: http://localhost:16686
193
+
-**OTLP Endpoint**: http://localhost:4318 (gRPC)
194
+
-**Configuration**: OTLP collector enabled
195
+
-**Integration**: Semantic Router sends traces via OTLP
196
+
134
197
## Networking
135
198
136
199
All services join the `semantic-network` bridge network with a fixed subnet to make in-network lookups stable. Host-published ports are listed above under Services & Ports.
# NOT supported: domain names (example.com), protocol prefixes (http://), paths (/api), ports in address (use 'port' field)
33
33
vllm_endpoints:
34
34
- name: "endpoint1"
35
-
address: "127.0.0.1"#IPv4 address - REQUIRED format
36
-
port: 8000
35
+
address: "172.28.0.20"# Static IPv4 of llm-katan within docker compose network
36
+
port: 8002
37
37
weight: 1
38
38
39
39
model_config:
40
-
"openai/gpt-oss-20b":
41
-
reasoning_family: "gpt-oss"# This model uses GPT-OSS reasoning syntax
40
+
"qwen3":
41
+
reasoning_family: "qwen3"# This model uses Qwen-3 reasoning syntax
42
42
preferred_endpoints: ["endpoint1"]
43
43
pii_policy:
44
44
allow_by_default: true
@@ -61,77 +61,113 @@ classifier:
61
61
# Categories with new use_reasoning field structure
62
62
categories:
63
63
- name: business
64
+
system_prompt: "You are a senior business consultant and strategic advisor with expertise in corporate strategy, operations management, financial analysis, marketing, and organizational development. Provide practical, actionable business advice backed by proven methodologies and industry best practices. Consider market dynamics, competitive landscape, and stakeholder interests in your recommendations."
65
+
# jailbreak_enabled: true # Optional: Override global jailbreak detection per category
66
+
# jailbreak_threshold: 0.8 # Optional: Override global jailbreak threshold per category
64
67
model_scores:
65
-
- model: openai/gpt-oss-20b
68
+
- model: qwen3
66
69
score: 0.7
67
-
use_reasoning: false # Business performs better without reasoning
70
+
use_reasoning: false # Business performs better without reasoning
68
71
- name: law
72
+
system_prompt: "You are a knowledgeable legal expert with comprehensive understanding of legal principles, case law, statutory interpretation, and legal procedures across multiple jurisdictions. Provide accurate legal information and analysis while clearly stating that your responses are for informational purposes only and do not constitute legal advice. Always recommend consulting with qualified legal professionals for specific legal matters."
69
73
model_scores:
70
-
- model: openai/gpt-oss-20b
74
+
- model: qwen3
71
75
score: 0.4
72
76
use_reasoning: false
73
77
- name: psychology
78
+
system_prompt: "You are a psychology expert with deep knowledge of cognitive processes, behavioral patterns, mental health, developmental psychology, social psychology, and therapeutic approaches. Provide evidence-based insights grounded in psychological research and theory. When discussing mental health topics, emphasize the importance of professional consultation and avoid providing diagnostic or therapeutic advice."
79
+
semantic_cache_enabled: true
80
+
semantic_cache_similarity_threshold: 0.92# High threshold for psychology - sensitive to nuances
74
81
model_scores:
75
-
- model: openai/gpt-oss-20b
82
+
- model: qwen3
76
83
score: 0.6
77
84
use_reasoning: false
78
85
- name: biology
86
+
system_prompt: "You are a biology expert with comprehensive knowledge spanning molecular biology, genetics, cell biology, ecology, evolution, anatomy, physiology, and biotechnology. Explain biological concepts with scientific accuracy, use appropriate terminology, and provide examples from current research. Connect biological principles to real-world applications and emphasize the interconnectedness of biological systems."
79
87
model_scores:
80
-
- model: openai/gpt-oss-20b
88
+
- model: qwen3
81
89
score: 0.9
82
90
use_reasoning: false
83
91
- name: chemistry
92
+
system_prompt: "You are a chemistry expert specializing in chemical reactions, molecular structures, and laboratory techniques. Provide detailed, step-by-step explanations."
84
93
model_scores:
85
-
- model: openai/gpt-oss-20b
94
+
- model: qwen3
86
95
score: 0.6
87
-
use_reasoning: true # Enable reasoning for complex chemistry
96
+
use_reasoning: true # Enable reasoning for complex chemistry
88
97
- name: history
98
+
system_prompt: "You are a historian with expertise across different time periods and cultures. Provide accurate historical context and analysis."
89
99
model_scores:
90
-
- model: openai/gpt-oss-20b
100
+
- model: qwen3
91
101
score: 0.7
92
102
use_reasoning: false
93
103
- name: other
104
+
system_prompt: "You are a helpful and knowledgeable assistant. Provide accurate, helpful responses across a wide range of topics."
105
+
semantic_cache_enabled: true
106
+
semantic_cache_similarity_threshold: 0.75# Lower threshold for general chat - less sensitive
94
107
model_scores:
95
-
- model: openai/gpt-oss-20b
108
+
- model: qwen3
96
109
score: 0.7
97
110
use_reasoning: false
98
111
- name: health
112
+
system_prompt: "You are a health and medical information expert with knowledge of anatomy, physiology, diseases, treatments, preventive care, nutrition, and wellness. Provide accurate, evidence-based health information while emphasizing that your responses are for educational purposes only and should never replace professional medical advice, diagnosis, or treatment. Always encourage users to consult healthcare professionals for medical concerns and emergencies."
113
+
semantic_cache_enabled: true
114
+
semantic_cache_similarity_threshold: 0.95# High threshold for health - very sensitive to word changes
99
115
model_scores:
100
-
- model: openai/gpt-oss-20b
116
+
- model: qwen3
101
117
score: 0.5
102
118
use_reasoning: false
103
119
- name: economics
120
+
system_prompt: "You are an economics expert with deep understanding of microeconomics, macroeconomics, econometrics, financial markets, monetary policy, fiscal policy, international trade, and economic theory. Analyze economic phenomena using established economic principles, provide data-driven insights, and explain complex economic concepts in accessible terms. Consider both theoretical frameworks and real-world applications in your responses."
104
121
model_scores:
105
-
- model: openai/gpt-oss-20b
122
+
- model: qwen3
106
123
score: 1.0
107
124
use_reasoning: false
108
125
- name: math
126
+
system_prompt: "You are a mathematics expert. Provide step-by-step solutions, show your work clearly, and explain mathematical concepts in an understandable way."
109
127
model_scores:
110
-
- model: openai/gpt-oss-20b
128
+
- model: qwen3
111
129
score: 1.0
112
-
use_reasoning: true # Enable reasoning for complex math
130
+
use_reasoning: true # Enable reasoning for complex math
113
131
- name: physics
132
+
system_prompt: "You are a physics expert with deep understanding of physical laws and phenomena. Provide clear explanations with mathematical derivations when appropriate."
114
133
model_scores:
115
-
- model: openai/gpt-oss-20b
134
+
- model: qwen3
116
135
score: 0.7
117
-
use_reasoning: true # Enable reasoning for physics
136
+
use_reasoning: true # Enable reasoning for physics
118
137
- name: computer science
138
+
system_prompt: "You are a computer science expert with knowledge of algorithms, data structures, programming languages, and software engineering. Provide clear, practical solutions with code examples when helpful."
119
139
model_scores:
120
-
- model: openai/gpt-oss-20b
140
+
- model: qwen3
121
141
score: 0.6
122
142
use_reasoning: false
123
143
- name: philosophy
144
+
system_prompt: "You are a philosophy expert with comprehensive knowledge of philosophical traditions, ethical theories, logic, metaphysics, epistemology, political philosophy, and the history of philosophical thought. Engage with complex philosophical questions by presenting multiple perspectives, analyzing arguments rigorously, and encouraging critical thinking. Draw connections between philosophical concepts and contemporary issues while maintaining intellectual honesty about the complexity and ongoing nature of philosophical debates."
124
145
model_scores:
125
-
- model: openai/gpt-oss-20b
146
+
- model: qwen3
126
147
score: 0.5
127
148
use_reasoning: false
128
149
- name: engineering
150
+
system_prompt: "You are an engineering expert with knowledge across multiple engineering disciplines including mechanical, electrical, civil, chemical, software, and systems engineering. Apply engineering principles, design methodologies, and problem-solving approaches to provide practical solutions. Consider safety, efficiency, sustainability, and cost-effectiveness in your recommendations. Use technical precision while explaining concepts clearly, and emphasize the importance of proper engineering practices and standards."
129
151
model_scores:
130
-
- model: openai/gpt-oss-20b
152
+
- model: qwen3
131
153
score: 0.7
132
154
use_reasoning: false
133
155
134
-
default_model: openai/gpt-oss-20b
156
+
default_model: "qwen3"
157
+
158
+
# Auto model name for automatic model selection (optional)
159
+
# This is the model name that clients should use to trigger automatic model selection
160
+
# If not specified, defaults to "MoM" (Mixture of Models)
161
+
# For backward compatibility, "auto" is always accepted as an alias
162
+
# Example: auto_model_name: "MoM" # or any other name you prefer
163
+
# auto_model_name: "MoM"
164
+
165
+
# Include configured models in /v1/models list endpoint (optional, default: false)
166
+
# When false (default): only the auto model name is returned in the /v1/models endpoint
167
+
# When true: all models configured in model_config are also included in the /v1/models endpoint
168
+
# This is useful for clients that need to discover all available models
0 commit comments