This directory contains LLMInferenceServices for deploying sample models. Please refer to the deployment guide for more details on how to test the MaaS Platform with these models.
TODO (ODH model controller): Update the ODH model controller to remove or modify the existing webhook that validates tier annotations (
alpha.maas.opendatahub.io/tiers). The webhook currently blocks HTTPRoutes when AuthPolicy is not enforced (e.g., Kuadrant not installed), requiringsecurity.opendatahub.io/enable-auth=false. For MaaS-managed models, tier/access control is handled by MaaSAuthPolicy and MaaSSubscription rather than LLMInferenceService annotations. The webhook should not apply automation or block models that are managed by MaaS. See JIRA: [TBD]
- simulator - Simple simulator for testing
- simulator-premium - Premium simulator for testing tier-based access (configured via MaaSAuthPolicy)
- facebook-opt-125m-cpu - Facebook OPT 125M model (CPU-based)
- qwen3 - Qwen3 model (GPU-based with autoscaling)
- ibm-granite-2b-gpu - IBM Granite 2B Instruct model (GPU-based, supports instructions)
Create the llm namespace where models are deployed (if it doesn't already exist):
kubectl create namespace llmDeploy any model using:
MODEL_NAME=simulator # or simulator-premium, facebook-opt-125m-cpu, qwen3, or ibm-granite-2b-gpu
kustomize build docs/samples/models/$MODEL_NAME | kubectl apply -f -To deploy both simulator models:
-
Deploy the standard simulator:
kustomize build docs/samples/models/simulator | kubectl apply -f - -
Deploy the premium simulator:
kustomize build docs/samples/models/simulator-premium | kubectl apply -f -
The two simulator models can be distinguished by:
-
Model Name:
- Standard:
facebook-opt-125m-simulated(from kustomization namePrefix) - Premium:
premium-simulated-simulated-premium(from kustomization namePrefix + model name)
- Standard:
-
LLMInferenceService Name:
- Standard:
facebook-opt-125m-simulated - Premium:
premium-simulated-simulated-premium
- Standard:
Tier-based access is configured via MaaSAuthPolicy and MaaSSubscription (see docs/samples/maas-system/), not via LLMInferenceService annotations.
After deploying both models:
# List all LLMInferenceServices
kubectl get llminferenceservices -n llm