DynamoModel is a Kubernetes Custom Resource that represents a machine learning model deployed on Dynamo. It enables you to:
- Deploy LoRA adapters on top of running base models
- Track model endpoints and their readiness across your cluster
- Manage model lifecycle declaratively with Kubernetes
DynamoModel works alongside DynamoGraphDeployment (DGD) or DynamoComponentDeployment (DCD) resources. While DGD/DCD deploy the inference infrastructure (pods, services), DynamoModel handles model-specific operations like loading LoRA adapters.
Before creating a DynamoModel, you need:
- A running
DynamoGraphDeploymentorDynamoComponentDeployment - Components configured with
modelRefpointing to your base model - Pods are ready and serving your base model
For complete setup including DGD configuration, see Integration with DynamoGraphDeployment.
1. Create your DynamoModel:
apiVersion: nvidia.com/v1alpha1
kind: DynamoModel
metadata:
name: my-lora
namespace: dynamo-system
spec:
modelName: my-custom-lora
baseModelName: Qwen/Qwen3-0.6B # Must match modelRef.name in your DGD
modelType: lora
source:
uri: s3://my-bucket/loras/my-lora2. Apply and verify:
# Apply the DynamoModel
kubectl apply -f my-lora.yaml
# Check status
kubectl get dynamomodel my-loraExpected output:
NAME TOTAL READY AGE
my-lora 2 2 30s
That's it! The operator automatically discovers endpoints and loads the LoRA.
For detailed status monitoring, see Monitoring & Operations.
DynamoModel supports three model types:
| Type | Description | Use Case |
|---|---|---|
base |
Reference to an existing base model | Tracking endpoints for a base model (default) |
lora |
LoRA adapter that extends a base model | Deploy fine-tuned adapters on existing models |
adapter |
Generic model adapter | Future extensibility for other adapter types |
Most users will use lora to deploy fine-tuned models on top of their base model deployments.
When you create a DynamoModel, the operator:
- Discovers endpoints: Finds all pods running your
baseModelName(by matchingmodelRef.namein DGD/DCD) - Creates service: Automatically creates a Kubernetes Service to track these pods
- Loads LoRA: Calls the LoRA load API on each endpoint (for
loratype) - Updates status: Reports which endpoints are ready
Key linkage:
# DGD modelRef.name ↔ DynamoModel baseModelName must match
Worker:
modelRef:
name: Qwen/Qwen3-0.6B
---
spec:
baseModelName: Qwen/Qwen3-0.6BDynamoModel requires just a few key fields to deploy a model or adapter:
| Field | Required | Purpose | Example |
|---|---|---|---|
modelName |
Yes | Model identifier | my-custom-lora |
baseModelName |
Yes | Links to DGD modelRef | Qwen/Qwen3-0.6B |
modelType |
No | Type: base/lora/adapter | lora (default: base) |
source.uri |
For LoRA | Model location | s3://bucket/path or hf://org/model |
Example minimal LoRA configuration:
apiVersion: nvidia.com/v1alpha1
kind: DynamoModel
metadata:
name: my-lora
spec:
modelName: my-custom-lora
baseModelName: Qwen/Qwen3-0.6B
modelType: lora
source:
uri: s3://my-bucket/my-loraFor complete field specifications, validation rules, and all options, see: 📖 DynamoModel API Reference
The status shows discovered endpoints and their readiness:
kubectl get dynamomodel my-loraKey status fields:
totalEndpoints/readyEndpoints: Counts of discovered vs ready endpointsendpoints[]: List with addresses, pod names, and ready statusconditions: Standard Kubernetes conditions (EndpointsReady, ServicesFound)
For detailed status usage, see the Monitoring & Operations section below
Deploy a LoRA adapter stored in an S3 bucket.
apiVersion: nvidia.com/v1alpha1
kind: DynamoModel
metadata:
name: customer-support-lora
namespace: production
spec:
modelName: customer-support-adapter-v1
baseModelName: meta-llama/Llama-3.3-70B-Instruct
modelType: lora
source:
uri: s3://my-models-bucket/loras/customer-support/v1Prerequisites:
- S3 bucket accessible from your pods (IAM role or credentials)
- Base model
meta-llama/Llama-3.3-70B-Instructrunning via DGD/DCD
Verification:
# Check LoRA is loaded
kubectl get dynamomodel customer-support-lora -o jsonpath='{.status.readyEndpoints}'
# Should output: 2 (or your number of replicas)
# View which pods are serving
kubectl get dynamomodel customer-support-lora -o jsonpath='{.status.endpoints[*].podName}'Deploy a LoRA adapter from HuggingFace Hub.
apiVersion: nvidia.com/v1alpha1
kind: DynamoModel
metadata:
name: multilingual-lora
namespace: dynamo-system
spec:
modelName: multilingual-adapter
baseModelName: Qwen/Qwen3-0.6B
modelType: lora
source:
uri: hf://myorg/qwen-multilingual-lora@v1.0.0 # Optional: @revisionPrerequisites:
- HuggingFace Hub accessible from your pods
- If private repo: HF token configured as secret and mounted in pods
- Base model
Qwen/Qwen3-0.6Brunning via DGD/DCD
With HuggingFace token:
# In your DGD/DCD
spec:
services:
worker:
envFromSecret: hf-token-secret # Provides HF_TOKEN env var
modelRef:
name: Qwen/Qwen3-0.6B
# ... rest of configDeploy multiple LoRA adapters on the same base model deployment.
---
# LoRA for customer support
apiVersion: nvidia.com/v1alpha1
kind: DynamoModel
metadata:
name: support-lora
spec:
modelName: support-adapter
baseModelName: Qwen/Qwen3-0.6B
modelType: lora
source:
uri: s3://models/support-lora
---
# LoRA for code generation
apiVersion: nvidia.com/v1alpha1
kind: DynamoModel
metadata:
name: code-lora
spec:
modelName: code-adapter
baseModelName: Qwen/Qwen3-0.6B # Same base model
modelType: lora
source:
uri: s3://models/code-loraBoth LoRAs will be loaded on all pods serving Qwen/Qwen3-0.6B. Your application can then route requests to the appropriate adapter.
Quick status check:
kubectl get dynamomodelExample output:
NAME TOTAL READY AGE
my-lora 2 2 5m
customer-lora 4 3 2h
Detailed status:
kubectl describe dynamomodel my-loraExample output:
Name: my-lora
Namespace: dynamo-system
Spec:
Model Name: my-custom-lora
Base Model Name: Qwen/Qwen3-0.6B
Model Type: lora
Source:
Uri: s3://my-bucket/my-lora
Status:
Ready Endpoints: 2
Total Endpoints: 2
Endpoints:
Address: http://10.0.1.5:9090
Pod Name: worker-0
Ready: true
Address: http://10.0.1.6:9090
Pod Name: worker-1
Ready: true
Conditions:
Type: EndpointsReady
Status: True
Reason: EndpointsDiscovered
Events:
Type Reason Message
---- ------ -------
Normal EndpointsReady Discovered 2 ready endpoints for base model Qwen/Qwen3-0.6B
An endpoint is ready when:
- The pod is running and healthy
- The LoRA load API call succeeded
Condition states:
EndpointsReady=True: All endpoints are ready (full availability)EndpointsReady=False, Reason=NotReady: Not all endpoints ready (check message for counts)EndpointsReady=False, Reason=NoEndpoints: No endpoints found
When readyEndpoints < totalEndpoints, the operator automatically retries loading every 30 seconds.
Get endpoint addresses:
kubectl get dynamomodel my-lora -o jsonpath='{.status.endpoints[*].address}' | tr ' ' '\n'Output:
http://10.0.1.5:9090
http://10.0.1.6:9090
Get endpoint pod names:
kubectl get dynamomodel my-lora -o jsonpath='{.status.endpoints[*].podName}' | tr ' ' '\n'Check readiness of each endpoint:
kubectl get dynamomodel my-lora -o json | jq '.status.endpoints[] | {podName, ready}'Output:
{
"podName": "worker-0",
"ready": true
}
{
"podName": "worker-1",
"ready": true
}To update a LoRA (e.g., deploy a new version):
# Edit the source URI
kubectl edit dynamomodel my-lora
# Or apply an updated YAML
kubectl apply -f my-lora-v2.yamlThe operator will detect the change and reload the LoRA on all endpoints.
kubectl delete dynamomodel my-loraFor LoRA models, the operator will:
- Unload the LoRA from all endpoints
- Clean up associated resources
- Remove the DynamoModel CR
The base model deployment (DGD/DCD) continues running normally.
Symptom:
status:
totalEndpoints: 0
readyEndpoints: 0
conditions:
- type: EndpointsReady
status: "False"
reason: NoEndpoints
message: "No endpoint slices found for base model Qwen/Qwen3-0.6B"Common Causes:
-
Base model deployment not running
# Check if pods exist kubectl get pods -l nvidia.com/dynamo-component-type=workerSolution: Deploy your DGD/DCD first, wait for pods to be ready.
-
baseModelNamemismatch# Check modelRef in your DGD kubectl get dynamographdeployment my-deployment -o yaml | grep -A2 modelRef
Solution: Ensure
baseModelNamein DynamoModel exactly matchesmodelRef.namein DGD. -
Pods not ready
# Check pod status kubectl get pods -l nvidia.com/dynamo-component-type=workerSolution: Wait for pods to reach
RunningandReadystate. -
Wrong namespace Solution: Ensure DynamoModel is in the same namespace as your DGD/DCD.
Symptom:
status:
totalEndpoints: 2
readyEndpoints: 0 # ← No endpoints ready despite pods existing
conditions:
- type: EndpointsReady
status: "False"
reason: NoReadyEndpointsCommon Causes:
-
Source URI not accessible
# Check operator logs kubectl logs -n dynamo-system deployment/dynamo-operator-controller-manager -f | grep "Failed to load"
Solution:
- For S3: Verify bucket permissions, IAM role, credentials
- For HuggingFace: Verify token is valid, repo exists and is accessible
-
Invalid LoRA format Solution: Ensure your LoRA weights are in the format expected by your backend framework (vLLM, SGLang, etc.)
-
Endpoint API errors
# Check operator logs for HTTP errors kubectl logs -n dynamo-system deployment/dynamo-operator-controller-manager | grep "error"
Solution: Check the backend framework's logs in the worker pods:
kubectl logs worker-0
-
Out of memory Solution: LoRA adapters require additional memory. Increase memory limits in your DGD:
resources: limits: memory: "32Gi" # Increase if needed
Symptom: Some endpoints remain not ready for extended periods.
Diagnosis:
# Check which endpoints are not ready
kubectl get dynamomodel my-lora -o json | jq '.status.endpoints[] | select(.ready == false)'
# View operator logs for that specific pod
kubectl logs -n dynamo-system deployment/dynamo-operator-controller-manager | grep "worker-0"
# Check the worker pod logs
kubectl logs worker-0 | tail -50Common Causes:
- Network issues: Pod can't reach S3/HuggingFace
- Resource constraints: Pod is OOMing or being throttled
- API endpoint not responding: Backend framework isn't serving the LoRA API
When to wait vs investigate:
- Wait: If readyEndpoints is increasing over time (LoRAs loading progressively)
- Investigate: If stuck at same readyEndpoints for >5 minutes
Check events:
kubectl describe dynamomodel my-lora | tail -20View operator logs:
# Follow logs
kubectl logs -n dynamo-system deployment/dynamo-operator-controller-manager -f
# Filter for specific model
kubectl logs -n dynamo-system deployment/dynamo-operator-controller-manager | grep "my-lora"Common events and messages:
| Event/Message | Meaning | Action |
|---|---|---|
EndpointsReady |
All endpoints are ready | ✅ Good - full service availability |
NotReady |
Not all endpoints ready | |
PartialEndpointFailure |
Some endpoints failed to load | Check logs for errors |
NoEndpointsFound |
No pods discovered | Verify DGD running and modelRef matches |
EndpointDiscoveryFailed |
Can't query endpoints | Check operator RBAC permissions |
Successfully reconciled |
Reconciliation complete | ✅ Good |
This section shows the complete end-to-end workflow for deploying base models and LoRA adapters together.
DynamoModel and DynamoGraphDeployment work together to provide complete model deployment:
- DGD: Deploys the infrastructure (pods, services, resources)
- DynamoModel: Manages model-specific operations (LoRA loading)
The connection is established through the modelRef field in your DGD:
Complete example:
---
# 1. Deploy the base model infrastructure
apiVersion: nvidia.com/v1alpha1
kind: DynamoGraphDeployment
metadata:
name: my-deployment
spec:
backendFramework: vllm
services:
Frontend:
componentType: frontend
replicas: 1
dynamoNamespace: my-app
extraPodSpec:
mainContainer:
image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:latest
Worker:
# This modelRef creates the link to DynamoModel
modelRef:
name: Qwen/Qwen3-0.6B # ← Key linking field
componentType: worker
replicas: 2
resources:
limits:
gpu: "1"
extraPodSpec:
mainContainer:
image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:latest
args:
- --model
- Qwen/Qwen3-0.6B
- --tensor-parallel-size
- "1"
---
# 2. Deploy LoRA adapters on top
apiVersion: nvidia.com/v1alpha1
kind: DynamoModel
metadata:
name: my-lora
spec:
modelName: my-custom-lora
baseModelName: Qwen/Qwen3-0.6B # ← Must match modelRef.name above
modelType: lora
source:
uri: s3://my-bucket/loras/my-loraRecommended order:
# 1. Deploy base model infrastructure
kubectl apply -f my-deployment.yaml
# 2. Wait for pods to be ready
kubectl wait --for=condition=ready pod -l nvidia.com/dynamo-component-type=worker --timeout=5m
# 3. Deploy LoRA adapters
kubectl apply -f my-lora.yaml
# 4. Verify LoRA is loaded
kubectl get dynamomodel my-loraWhat happens behind the scenes:
| Step | DGD | DynamoModel |
|---|---|---|
| 1 | Creates pods with modelRef | - |
| 2 | Pods become running and ready | - |
| 3 | - | CR created, discovers endpoints via auto-created Service |
| 4 | - | Calls LoRA load API on each endpoint |
| 5 | - | All endpoints ready ✓ |
The operator automatically handles all service discovery - you don't configure services, labels, or selectors manually.
For complete field specifications, validation rules, and detailed type definitions, see:
DynamoModel provides declarative model management for Dynamo deployments:
✅ Simple: 2-step deployment of LoRA adapters ✅ Automatic: Endpoint discovery and loading handled by operator ✅ Observable: Rich status reporting and conditions ✅ Integrated: Works seamlessly with DynamoGraphDeployment
Next Steps:
- Try the Quick Start example
- Explore Common Use Cases
- Check the API Reference for advanced configuration