Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md
huggingface-sentiment.yaml	huggingface-sentiment.yaml
inferenceservice-examples.yaml	inferenceservice-examples.yaml

Name

Last commit message

Last commit date

KServe Examples

Example KServe InferenceService configurations for model serving.

Files

File	Description
`inferenceservice-examples.yaml`	Complete example with sklearn model, ingress, and PDB

Quick Start

Deploy the sklearn-iris example:

kubectl apply -f inferenceservice-examples.yaml

Check the deployment status:

kubectl get inferenceservice -n mlops

Included Resources

InferenceService: sklearn-iris

A production-ready sklearn model deployment:

Uses public sklearn iris model from GCS
Configured with resource limits
Pod anti-affinity for high availability
Autoscaling from 1-3 replicas

Ingress: sklearn-iris-ingress

AWS ALB ingress for external access:

Internet-facing scheme
Health check on model endpoint
IP target type

ServiceAccount: kserve-inference

Dedicated service account for inference workloads.

PodDisruptionBudget: sklearn-iris-pdb

Ensures at least 1 replica during cluster maintenance.

Testing the Model

After deployment, port-forward to test locally:

kubectl port-forward svc/sklearn-iris-predictor -n mlops 8080:80

Send a test prediction:

curl -X POST http://localhost:8080/v1/models/sklearn-iris:predict \
  -H "Content-Type: application/json" \
  -d '{"instances": [[5.1, 3.5, 1.4, 0.2]]}'

Customization

To deploy your own model, modify the storageUri:

spec:
  predictor:
    model:
      modelFormat:
        name: sklearn  # or pytorch, tensorflow, etc.
      storageUri: gs://your-bucket/models/your-model

HuggingFace Sentiment Analysis

Deploy a pretrained sentiment analysis model using KServe's native HuggingFace runtime:

kubectl apply -f huggingface-sentiment.yaml

Wait for readiness:

kubectl wait --for=condition=Ready inferenceservice/hf-sentiment -n mlops --timeout=600s

Test:

kubectl port-forward svc/hf-sentiment-predictor -n mlops 8080:80
curl -X POST http://localhost:8080/v1/models/sentiment:predict \
  -H "Content-Type: application/json" \
  -d '{"instances": ["I love this product!"]}'

Related Examples

examples/llm-inference/ - LLM serving with vLLM
examples/distributed-training/ - Multi-GPU training

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

KServe Examples

Files

Quick Start

Included Resources

InferenceService: sklearn-iris

Ingress: sklearn-iris-ingress

ServiceAccount: kserve-inference

PodDisruptionBudget: sklearn-iris-pdb

Testing the Model

Customization

HuggingFace Sentiment Analysis

Related Examples

FilesExpand file tree

kserve

Directory actions

More options

Directory actions

More options

Latest commit

History

kserve

Folders and files

parent directory

README.md

KServe Examples

Files

Quick Start

Included Resources

InferenceService: sklearn-iris

Ingress: sklearn-iris-ingress

ServiceAccount: kserve-inference

PodDisruptionBudget: sklearn-iris-pdb

Testing the Model

Customization

HuggingFace Sentiment Analysis

Related Examples