Skip to content

Commit 44c929f

Browse files
committed
fix lint
Signed-off-by: Ryan Cook <[email protected]>
1 parent d1c4dc4 commit 44c929f

File tree

4 files changed

+50
-5
lines changed

4 files changed

+50
-5
lines changed

deploy/kserve/QUICKSTART.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,13 +41,15 @@ cd deploy/kserve
4141
```
4242

4343
**Example:**
44+
4445
```bash
4546
./deploy.sh --namespace semantic --inferenceservice granite32-8b --model granite32-8b
4647
```
4748

4849
### Step 3: Wait for Ready
4950

5051
The script will:
52+
5153
- ✓ Validate your environment
5254
- ✓ Download classification models (~2-3 minutes)
5355
- ✓ Start the semantic router
@@ -97,6 +99,7 @@ Using a specific storage class, larger PVCs, and custom embedding model:
9799
```
98100

99101
**Available Embedding Models:**
102+
100103
- `all-MiniLM-L12-v2` (default) - Balanced speed/quality (~90MB)
101104
- `all-mpnet-base-v2` - Higher quality, larger (~420MB)
102105
- `all-MiniLM-L6-v2` - Faster, smaller (~80MB)
@@ -241,6 +244,7 @@ Simply redeploy:
241244
## Next Steps
242245

243246
1. **Run validation tests**:
247+
244248
```bash
245249
# Set namespace and model name
246250
NAMESPACE=<namespace> MODEL_NAME=<model> ./test-semantic-routing.sh
@@ -274,6 +278,7 @@ Simply redeploy:
274278
## Want More Control?
275279

276280
This quick start uses the automated `deploy.sh` script for simplicity. If you need:
281+
277282
- Manual step-by-step deployment
278283
- Deep understanding of configuration options
279284
- Advanced customization

deploy/kserve/README.md

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -103,6 +103,14 @@ INFERENCESERVICE_NAME=<your-inferenceservice-name>
103103

104104
# KServe creates a headless service by default (no stable ClusterIP)
105105
# Create a stable ClusterIP service for consistent routing
106+
107+
# Option 1: Using the template file (recommended)
108+
# Substitute variables and apply
109+
sed -e "s/{{INFERENCESERVICE_NAME}}/$INFERENCESERVICE_NAME/g" \
110+
-e "s/{{NAMESPACE}}/$NAMESPACE/g" \
111+
service-predictor-stable.yaml | oc apply -f - -n $NAMESPACE
112+
113+
# Option 2: Using heredoc
106114
cat <<EOF | oc apply -f - -n $NAMESPACE
107115
apiVersion: v1
108116
kind: Service
@@ -239,6 +247,7 @@ Find the `kserve_dynamic_cluster` section and update:
239247
```
240248

241249
Replace:
250+
242251
- `my-model` with your InferenceService name
243252
- `my-namespace` with your namespace
244253

@@ -709,6 +718,7 @@ oc logs -l app=semantic-router -c semantic-router -n $NAMESPACE -f
709718
**Important Log Events:**
710719

711720
- `routing_decision`: Which model was selected and why
721+
712722
```json
713723
{
714724
"msg": "routing_decision",
@@ -720,6 +730,7 @@ oc logs -l app=semantic-router -c semantic-router -n $NAMESPACE -f
720730
```
721731

722732
- `cache_hit`/`cache_miss`: Cache performance
733+
723734
```json
724735
{
725736
"msg": "cache_hit",
@@ -730,6 +741,7 @@ oc logs -l app=semantic-router -c semantic-router -n $NAMESPACE -f
730741
```
731742

732743
- `llm_usage`: Token usage and costs
744+
733745
```json
734746
{
735747
"msg": "llm_usage",
@@ -742,6 +754,7 @@ oc logs -l app=semantic-router -c semantic-router -n $NAMESPACE -f
742754
```
743755

744756
- `pii_detection`: PII found in requests
757+
745758
```json
746759
{
747760
"msg": "pii_detection",
@@ -815,9 +828,11 @@ oc describe pod -l app=semantic-router -n $NAMESPACE
815828
- Solution: Check network policies, proxy settings
816829

817830
2. **PVC not bound**: Storage not provisioned
831+
818832
```bash
819833
oc get pvc -n $NAMESPACE
820834
```
835+
821836
- Solution: Check StorageClass, provision capacity
822837

823838
3. **OOM during model download**: Insufficient memory
@@ -836,21 +851,27 @@ oc logs -l app=semantic-router -c semantic-router -n $NAMESPACE --previous
836851
**Common Causes**:
837852

838853
1. **Configuration error**: Invalid YAML or missing fields
854+
839855
```
840856
Failed to load config: yaml: unmarshal errors
841857
```
858+
842859
- Solution: Validate YAML syntax, check required fields
843860

844861
2. **Invalid IP address**: Router validation failed
862+
845863
```
846864
invalid IP address format, got: my-model.svc.cluster.local
847865
```
866+
848867
- Solution: Use service ClusterIP (not DNS) in `vllm_endpoints.address` - see Step 1 for creating stable service
849868

850869
3. **Missing models**: Classification models not downloaded
870+
851871
```
852872
failed to read mapping file: no such file or directory
853873
```
874+
854875
- Solution: Check init container completed successfully
855876

856877
### Cannot Connect to InferenceService
@@ -870,24 +891,30 @@ oc exec $POD -c semantic-router -n $NAMESPACE -- \
870891
**Common Causes**:
871892

872893
1. **InferenceService not ready**:
894+
873895
```bash
874896
oc get inferenceservice -n $NAMESPACE
875897
```
898+
876899
- Solution: Wait for READY=True, check predictor logs
877900

878901
2. **Wrong DNS name**: Incorrect service name in Envoy config
879902
- Solution: Verify format: `<inferenceservice>-predictor.<namespace>.svc.cluster.local`
880903

881904
3. **Network policy blocking**: Istio/NetworkPolicy restrictions
905+
882906
```bash
883907
oc get networkpolicies -n $NAMESPACE
884908
```
909+
885910
- Solution: Add policy to allow traffic from router to predictor
886911

887912
4. **PeerAuthentication conflict**: mTLS mode mismatch
913+
888914
```bash
889915
oc get peerauthentication -n $NAMESPACE
890916
```
917+
891918
- Solution: Ensure PERMISSIVE mode or adjust Envoy TLS config
892919

893920
### Predictor Pod IP Changed (If Using Pod IP Instead of Service IP)
@@ -901,6 +928,7 @@ oc exec $POD -c semantic-router -n $NAMESPACE -- \
901928
**Solution**:
902929

903930
1. Switch to stable service approach (recommended):
931+
904932
```bash
905933
# Create stable service
906934
cat <<EOF | oc apply -f - -n $NAMESPACE
@@ -944,9 +972,11 @@ oc logs -l app=semantic-router -c semantic-router -n $NAMESPACE \
944972
**Common Causes**:
945973

946974
1. **Threshold too high**: Similarity threshold prevents matches
975+
947976
```yaml
948977
similarity_threshold: 0.99 # Too strict
949978
```
979+
950980
- Solution: Lower threshold to 0.8-0.85
951981

952982
2. **Cache disabled**: Not enabled in config
@@ -1066,6 +1096,7 @@ For multi-replica deployments with shared cache:
10661096

10671097
1. Deploy Milvus in your cluster
10681098
2. Update `configmap-router-config.yaml`:
1099+
10691100
```yaml
10701101
semantic_cache:
10711102
enabled: true
@@ -1077,6 +1108,7 @@ For multi-replica deployments with shared cache:
10771108
```
10781109

10791110
3. Apply and restart:
1111+
10801112
```bash
10811113
oc apply -f configmap-router-config.yaml -n $NAMESPACE
10821114
oc rollout restart deployment/semantic-router-kserve -n $NAMESPACE

deploy/kserve/deploy.sh

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -212,7 +212,14 @@ if [ "$SKIP_VALIDATION" = false ]; then
212212

213213
# Create stable ClusterIP service for predictor (KServe creates headless service by default)
214214
echo "Creating stable ClusterIP service for predictor..."
215-
cat <<EOF | oc apply -f - -n "$NAMESPACE" > /dev/null 2>&1
215+
216+
# Use template file for stable service
217+
if [ -f "$SCRIPT_DIR/service-predictor-stable.yaml" ]; then
218+
substitute_vars "$SCRIPT_DIR/service-predictor-stable.yaml" "$TEMP_DIR/service-predictor-stable.yaml.tmp"
219+
oc apply -f "$TEMP_DIR/service-predictor-stable.yaml.tmp" -n "$NAMESPACE" > /dev/null 2>&1
220+
else
221+
# Fallback to inline creation if template not found
222+
cat <<EOF | oc apply -f - -n "$NAMESPACE" > /dev/null 2>&1
216223
apiVersion: v1
217224
kind: Service
218225
metadata:
@@ -233,6 +240,7 @@ spec:
233240
targetPort: 8080
234241
protocol: TCP
235242
EOF
243+
fi
236244

237245
# Get the stable ClusterIP
238246
PREDICTOR_SERVICE_IP=$(oc get svc "${INFERENCESERVICE_NAME}-predictor-stable" -n "$NAMESPACE" -o jsonpath='{.spec.clusterIP}' 2>/dev/null || echo "")

deploy/kserve/service-predictor-stable.yaml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3,17 +3,17 @@
33
apiVersion: v1
44
kind: Service
55
metadata:
6-
name: {{INFERENCESERVICE_NAME}}-predictor-stable
7-
namespace: {{NAMESPACE}}
6+
name: "{{INFERENCESERVICE_NAME}}-predictor-stable"
7+
namespace: "{{NAMESPACE}}"
88
labels:
9-
app: {{INFERENCESERVICE_NAME}}
9+
app: "{{INFERENCESERVICE_NAME}}"
1010
component: predictor-stable
1111
annotations:
1212
description: "Stable ClusterIP service for semantic router to use (headless service doesn't provide ClusterIP)"
1313
spec:
1414
type: ClusterIP
1515
selector:
16-
serving.kserve.io/inferenceservice: {{INFERENCESERVICE_NAME}}
16+
serving.kserve.io/inferenceservice: "{{INFERENCESERVICE_NAME}}"
1717
ports:
1818
- name: http
1919
port: 8080

0 commit comments

Comments
 (0)