Example KServe InferenceService configurations for model serving.
| File | Description |
|---|---|
inferenceservice-examples.yaml |
Complete example with sklearn model, ingress, and PDB |
Deploy the sklearn-iris example:
kubectl apply -f inferenceservice-examples.yamlCheck the deployment status:
kubectl get inferenceservice -n mlopsA production-ready sklearn model deployment:
- Uses public sklearn iris model from GCS
- Configured with resource limits
- Pod anti-affinity for high availability
- Autoscaling from 1-3 replicas
AWS ALB ingress for external access:
- Internet-facing scheme
- Health check on model endpoint
- IP target type
Dedicated service account for inference workloads.
Ensures at least 1 replica during cluster maintenance.
After deployment, port-forward to test locally:
kubectl port-forward svc/sklearn-iris-predictor -n mlops 8080:80Send a test prediction:
curl -X POST http://localhost:8080/v1/models/sklearn-iris:predict \
-H "Content-Type: application/json" \
-d '{"instances": [[5.1, 3.5, 1.4, 0.2]]}'To deploy your own model, modify the storageUri:
spec:
predictor:
model:
modelFormat:
name: sklearn # or pytorch, tensorflow, etc.
storageUri: gs://your-bucket/models/your-modelDeploy a pretrained sentiment analysis model using KServe's native HuggingFace runtime:
kubectl apply -f huggingface-sentiment.yamlWait for readiness:
kubectl wait --for=condition=Ready inferenceservice/hf-sentiment -n mlops --timeout=600sTest:
kubectl port-forward svc/hf-sentiment-predictor -n mlops 8080:80
curl -X POST http://localhost:8080/v1/models/sentiment:predict \
-H "Content-Type: application/json" \
-d '{"instances": ["I love this product!"]}'examples/llm-inference/- LLM serving with vLLMexamples/distributed-training/- Multi-GPU training