Skip to content

Commit ba8d60a

Browse files
committed
directions including the usage of a new svc
Signed-off-by: Ryan Cook <[email protected]>
1 parent 6a38f70 commit ba8d60a

File tree

4 files changed

+718
-0
lines changed

4 files changed

+718
-0
lines changed

deploy/kserve/QUICKSTART.md

Lines changed: 291 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,291 @@
1+
# Quick Start Guide - Semantic Router with KServe
2+
3+
**🚀 Automated deployment in under 5 minutes using the helper script.**
4+
5+
> **Need more control?** See [README.md](./README.md) for comprehensive manual deployment and configuration.
6+
>
7+
> This quick start uses the automated `deploy.sh` script for the fastest path to deployment.
8+
9+
## Prerequisites Checklist
10+
11+
- [ ] OpenShift cluster with OpenShift AI installed
12+
- [ ] At least one KServe InferenceService deployed and ready
13+
- [ ] OpenShift CLI (`oc`) installed
14+
- [ ] Logged in to your cluster (`oc login`)
15+
- [ ] Sufficient permissions in your namespace
16+
17+
## 5-Minute Deployment
18+
19+
### Step 1: Verify Your Model
20+
21+
```bash
22+
# Set your namespace
23+
NAMESPACE=<your-namespace>
24+
25+
# List your InferenceServices
26+
oc get inferenceservice -n $NAMESPACE
27+
28+
# Note the InferenceService name and verify it's READY=True
29+
```
30+
31+
### Step 2: Deploy Semantic Router
32+
33+
```bash
34+
cd deploy/kserve
35+
36+
# Deploy with one command
37+
./deploy.sh \
38+
--namespace <your-namespace> \
39+
--inferenceservice <your-inferenceservice-name> \
40+
--model <your-model-name>
41+
```
42+
43+
**Example:**
44+
```bash
45+
./deploy.sh --namespace semantic --inferenceservice granite32-8b --model granite32-8b
46+
```
47+
48+
### Step 3: Wait for Ready
49+
50+
The script will:
51+
- ✓ Validate your environment
52+
- ✓ Download classification models (~2-3 minutes)
53+
- ✓ Start the semantic router
54+
- ✓ Provide your external URL
55+
56+
### Step 4: Test It
57+
58+
```bash
59+
# Use the URL provided by the deployment script
60+
ROUTER_URL=<your-route-url>
61+
62+
# Quick test
63+
curl -k "https://$ROUTER_URL/v1/models"
64+
65+
# Try a chat completion
66+
curl -k "https://$ROUTER_URL/v1/chat/completions" \
67+
-H "Content-Type: application/json" \
68+
-d '{
69+
"model": "<your-model>",
70+
"messages": [{"role": "user", "content": "What is 2+2?"}]
71+
}'
72+
```
73+
74+
## Common Scenarios
75+
76+
### Scenario 1: Basic Deployment (Default Settings)
77+
78+
Just need semantic routing with defaults:
79+
80+
```bash
81+
./deploy.sh -n myproject -i mymodel -m mymodel
82+
```
83+
84+
### Scenario 2: Custom Storage
85+
86+
Using a specific storage class or larger PVCs:
87+
88+
```bash
89+
./deploy.sh \
90+
-n myproject \
91+
-i mymodel \
92+
-m mymodel \
93+
-s gp3-csi \
94+
--models-pvc-size 20Gi \
95+
--cache-pvc-size 10Gi
96+
```
97+
98+
### Scenario 3: Preview Before Deploying
99+
100+
Want to see what will be created first:
101+
102+
```bash
103+
./deploy.sh -n myproject -i mymodel -m mymodel --dry-run
104+
```
105+
106+
## What You Get
107+
108+
Once deployed, you have:
109+
110+
**Intelligent Routing** - Requests route based on semantic understanding
111+
**PII Protection** - Sensitive data detection and blocking
112+
**Semantic Caching** - ~50% faster responses for similar queries
113+
**Jailbreak Detection** - Security against prompt injection
114+
**OpenAI Compatible API** - Drop-in replacement for OpenAI endpoints
115+
**Production Ready** - Monitoring, logging, and metrics included
116+
117+
## Accessing Your Deployment
118+
119+
### External URL
120+
121+
```bash
122+
# Get your route
123+
oc get route semantic-router-kserve -n <namespace>
124+
125+
# Access via HTTPS
126+
ROUTER_URL=$(oc get route semantic-router-kserve -n <namespace> -o jsonpath='{.spec.host}')
127+
echo "https://$ROUTER_URL"
128+
```
129+
130+
### Logs
131+
132+
```bash
133+
# View router logs
134+
oc logs -l app=semantic-router -c semantic-router -n <namespace> -f
135+
136+
# View all logs
137+
oc logs -l app=semantic-router --all-containers -n <namespace> -f
138+
```
139+
140+
### Metrics
141+
142+
```bash
143+
# Port-forward metrics endpoint
144+
POD=$(oc get pods -l app=semantic-router -n <namespace> -o jsonpath='{.items[0].metadata.name}')
145+
oc port-forward $POD 9190:9190 -n <namespace>
146+
147+
# View in browser
148+
open http://localhost:9190/metrics
149+
```
150+
151+
## Integration Examples
152+
153+
### Python (OpenAI SDK)
154+
155+
```python
156+
from openai import OpenAI
157+
158+
# Point to your semantic router
159+
client = OpenAI(
160+
base_url="https://<your-router-url>/v1",
161+
api_key="not-needed" # KServe doesn't require API key by default
162+
)
163+
164+
# Use like normal OpenAI
165+
response = client.chat.completions.create(
166+
model="<your-model>",
167+
messages=[{"role": "user", "content": "Explain quantum computing"}]
168+
)
169+
170+
print(response.choices[0].message.content)
171+
```
172+
173+
### cURL
174+
175+
```bash
176+
curl -k "https://<your-router-url>/v1/chat/completions" \
177+
-H "Content-Type: application/json" \
178+
-d '{
179+
"model": "<your-model>",
180+
"messages": [
181+
{"role": "user", "content": "Write a Python function to calculate fibonacci"}
182+
],
183+
"max_tokens": 500
184+
}'
185+
```
186+
187+
### LangChain
188+
189+
```python
190+
from langchain_openai import ChatOpenAI
191+
192+
llm = ChatOpenAI(
193+
base_url="https://<your-router-url>/v1",
194+
model="<your-model>",
195+
api_key="not-needed"
196+
)
197+
198+
response = llm.invoke("What are the benefits of semantic routing?")
199+
print(response.content)
200+
```
201+
202+
## Troubleshooting Quick Fixes
203+
204+
### Pod Not Starting
205+
206+
```bash
207+
# Check pod status
208+
oc get pods -l app=semantic-router -n <namespace>
209+
210+
# View events
211+
oc describe pod -l app=semantic-router -n <namespace>
212+
213+
# Check init container logs (model download)
214+
oc logs -l app=semantic-router -c model-downloader -n <namespace>
215+
```
216+
217+
### Can't Connect to InferenceService
218+
219+
```bash
220+
# Test connectivity from router pod
221+
POD=$(oc get pods -l app=semantic-router -o jsonpath='{.items[0].metadata.name}')
222+
oc exec $POD -c semantic-router -n <namespace> -- \
223+
curl http://<inferenceservice>-predictor.<namespace>.svc.cluster.local:8080/v1/models
224+
```
225+
226+
### Predictor Pod Restarted (IP Changed)
227+
228+
Simply redeploy:
229+
230+
```bash
231+
./deploy.sh -n <namespace> -i <inferenceservice> -m <model>
232+
```
233+
234+
## Next Steps
235+
236+
1. **Run validation tests**:
237+
```bash
238+
NAMESPACE=<ns> MODEL_NAME=<model> ./test-semantic-routing.sh
239+
```
240+
241+
2. **Customize configuration**: See [README.md](./README.md) for detailed configuration options:
242+
- Adjust category scores and routing logic
243+
- Configure PII policies and prompt guards
244+
- Tune semantic caching parameters
245+
- Set up multi-model routing
246+
- Configure monitoring and tracing
247+
248+
3. **Advanced topics**: [README.md](./README.md) covers:
249+
- Multi-model configuration
250+
- Horizontal and vertical scaling
251+
- Troubleshooting guides
252+
- Monitoring and observability
253+
- Production hardening
254+
255+
## Getting Help
256+
257+
- 📖 **Manual Deployment & Configuration**: [README.md](./README.md) - comprehensive guide
258+
- 🌐 **Project Website**: https://vllm-semantic-router.com
259+
- 💬 **GitHub Issues**: https://github.com/vllm-project/semantic-router/issues
260+
- 📚 **KServe Docs**: https://kserve.github.io/website/
261+
262+
## Want More Control?
263+
264+
This quick start uses the automated `deploy.sh` script for simplicity. If you need:
265+
- Manual step-by-step deployment
266+
- Deep understanding of configuration options
267+
- Advanced customization
268+
- Troubleshooting guidance
269+
- Production hardening tips
270+
271+
**See the comprehensive [README.md](./README.md) guide.**
272+
273+
## Cleanup
274+
275+
To remove the deployment:
276+
277+
```bash
278+
NAMESPACE=<your-namespace>
279+
280+
oc delete route semantic-router-kserve -n $NAMESPACE
281+
oc delete service semantic-router-kserve -n $NAMESPACE
282+
oc delete deployment semantic-router-kserve -n $NAMESPACE
283+
oc delete configmap semantic-router-kserve-config semantic-router-envoy-kserve-config -n $NAMESPACE
284+
oc delete pvc semantic-router-models semantic-router-cache -n $NAMESPACE
285+
oc delete peerauthentication semantic-router-kserve-permissive -n $NAMESPACE
286+
oc delete serviceaccount semantic-router -n $NAMESPACE
287+
```
288+
289+
---
290+
291+
**Questions?** Check the [README.md](./README.md) for detailed documentation or open an issue on GitHub.

0 commit comments

Comments
 (0)