Name	Name	Last commit message	Last commit date
parent directory ..
facebook-opt-125m-cpu	facebook-opt-125m-cpu
ibm-granite-2b-gpu	ibm-granite-2b-gpu
qwen3	qwen3
simulator-premium	simulator-premium
simulator	simulator
README.md	README.md

Name

Last commit message

Last commit date

facebook-opt-125m-cpu

Sample LLMInferenceService Models

This directory contains LLMInferenceServices for deploying sample models. Please refer to the deployment guide for more details on how to test the MaaS Platform with these models.

TODO (ODH model controller): Update the ODH model controller to remove or modify the existing webhook that validates tier annotations (alpha.maas.opendatahub.io/tiers). The webhook currently blocks HTTPRoutes when AuthPolicy is not enforced (e.g., Kuadrant not installed), requiring security.opendatahub.io/enable-auth=false. For MaaS-managed models, tier/access control is handled by MaaSAuthPolicy and MaaSSubscription rather than LLMInferenceService annotations. The webhook should not apply automation or block models that are managed by MaaS. See JIRA: [TBD]

Available Models

simulator - Simple simulator for testing
simulator-premium - Premium simulator for testing tier-based access (configured via MaaSAuthPolicy)
facebook-opt-125m-cpu - Facebook OPT 125M model (CPU-based)
qwen3 - Qwen3 model (GPU-based with autoscaling)
ibm-granite-2b-gpu - IBM Granite 2B Instruct model (GPU-based, supports instructions)

Deployment

Basic Deployment

Create the llm namespace where models are deployed (if it doesn't already exist):

kubectl create namespace llm

Deploy any model using:

MODEL_NAME=simulator # or simulator-premium, facebook-opt-125m-cpu, qwen3, or ibm-granite-2b-gpu
kustomize build docs/samples/models/$MODEL_NAME | kubectl apply -f -

Deploying Multiple Models

To deploy both simulator models:

Deploy the standard simulator:

kustomize build docs/samples/models/simulator | kubectl apply -f -

Deploy the premium simulator:

kustomize build docs/samples/models/simulator-premium | kubectl apply -f -

Distinguishing Between Models

The two simulator models can be distinguished by:

Model Name:
- Standard: facebook-opt-125m-simulated (from kustomization namePrefix)
- Premium: premium-simulated-simulated-premium (from kustomization namePrefix + model name)
LLMInferenceService Name:
- Standard: facebook-opt-125m-simulated
- Premium: premium-simulated-simulated-premium

Tier-based access is configured via MaaSAuthPolicy and MaaSSubscription (see docs/samples/maas-system/), not via LLMInferenceService annotations.

Verifying Deployment

After deploying both models:

# List all LLMInferenceServices
kubectl get llminferenceservices -n llm

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Sample LLMInferenceService Models

Available Models

Deployment

Basic Deployment

Deploying Multiple Models

Distinguishing Between Models

Verifying Deployment

FilesExpand file tree

models

Directory actions

More options

Directory actions

More options

Latest commit

History

models

Folders and files

parent directory

README.md

Sample LLMInferenceService Models

Available Models

Deployment

Basic Deployment

Deploying Multiple Models

Distinguishing Between Models

Verifying Deployment