Sample LLMInferenceService Models

This directory contains LLMInferenceServices for deploying sample models. Please refer to the deployment guide for more details on how to test the MaaS Platform with these models.

TODO (ODH model controller): Update the ODH model controller to remove or modify the existing webhook that validates tier annotations (alpha.maas.opendatahub.io/tiers). The webhook currently blocks HTTPRoutes when AuthPolicy is not enforced (e.g., Kuadrant not installed), requiring security.opendatahub.io/enable-auth=false. For MaaS-managed models, tier/access control is handled by MaaSAuthPolicy and MaaSSubscription rather than LLMInferenceService annotations. The webhook should not apply automation or block models that are managed by MaaS. See JIRA: [TBD]

Available Models

simulator - Simple simulator for testing
simulator-premium - Premium simulator for testing tier-based access (configured via MaaSAuthPolicy)
facebook-opt-125m-cpu - Facebook OPT 125M model (CPU-based)
qwen3 - Qwen3 model (GPU-based with autoscaling)
ibm-granite-2b-gpu - IBM Granite 2B Instruct model (GPU-based, supports instructions)

Deployment

Basic Deployment

Create the llm namespace where models are deployed (if it doesn't already exist):

kubectl create namespace llm

Deploy any model using:

MODEL_NAME=simulator # or simulator-premium, facebook-opt-125m-cpu, qwen3, or ibm-granite-2b-gpu
kustomize build docs/samples/models/$MODEL_NAME | kubectl apply -f -

Deploying Multiple Models

To deploy both simulator models:

Deploy the standard simulator:

kustomize build docs/samples/models/simulator | kubectl apply -f -

Deploy the premium simulator:

kustomize build docs/samples/models/simulator-premium | kubectl apply -f -

Distinguishing Between Models

The two simulator models can be distinguished by:

Model Name:
- Standard: facebook-opt-125m-simulated (from kustomization namePrefix)
- Premium: premium-simulated-simulated-premium (from kustomization namePrefix + model name)
LLMInferenceService Name:
- Standard: facebook-opt-125m-simulated
- Premium: premium-simulated-simulated-premium

Tier-based access is configured via MaaSAuthPolicy and MaaSSubscription (see docs/samples/maas-system/), not via LLMInferenceService annotations.

Verifying Deployment

After deploying both models:

# List all LLMInferenceServices
kubectl get llminferenceservices -n llm

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sample LLMInferenceService Models

Available Models

Deployment

Basic Deployment

Deploying Multiple Models

Distinguishing Between Models

Verifying Deployment

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Sample LLMInferenceService Models

Available Models

Deployment

Basic Deployment

Deploying Multiple Models

Distinguishing Between Models

Verifying Deployment