Skip to content

Latest commit

 

History

History
1415 lines (818 loc) · 68.9 KB

File metadata and controls

1415 lines (818 loc) · 68.9 KB

⚠️ Important: This documentation is automatically generated from source code. Do not edit this file directly.

API Reference

Packages

nvidia.com/v1alpha1

Package v1alpha1 contains API Schema definitions for the nvidia.com v1alpha1 API group.

This package defines the DynamoGraphDeploymentRequest (DGDR) custom resource, which provides a high-level, SLA-driven interface for deploying machine learning models on Dynamo.

Package v1alpha1 contains API Schema definitions for the nvidia.com v1alpha1 API group.

Resource Types

Autoscaling

Deprecated: This field is deprecated and ignored. Use DynamoGraphDeploymentScalingAdapter with HPA, KEDA, or Planner for autoscaling instead. See docs/kubernetes/autoscaling.md for migration guidance. This field will be removed in a future API version.

Appears in:

Field Description Default Validation
enabled boolean Deprecated: This field is ignored.
minReplicas integer Deprecated: This field is ignored.
maxReplicas integer Deprecated: This field is ignored.
behavior HorizontalPodAutoscalerBehavior Deprecated: This field is ignored.
metrics MetricSpec array Deprecated: This field is ignored.

CheckpointMode

Underlying type: string

CheckpointMode defines how checkpoint creation is handled

Validation:

  • Enum: [Auto Manual]

Appears in:

Field Description
Auto CheckpointModeAuto means the DGD controller will automatically create a Checkpoint CR
Manual CheckpointModeManual means the user must create the Checkpoint CR themselves

ComponentKind

Underlying type: string

ComponentKind represents the type of underlying Kubernetes resource.

Validation:

  • Enum: [PodClique PodCliqueScalingGroup Deployment LeaderWorkerSet]

Appears in:

Field Description
PodClique ComponentKindPodClique represents a PodClique resource.
PodCliqueScalingGroup ComponentKindPodCliqueScalingGroup represents a PodCliqueScalingGroup resource.
Deployment ComponentKindDeployment represents a Deployment resource.
LeaderWorkerSet ComponentKindLeaderWorkerSet represents a LeaderWorkerSet resource.

ConfigMapKeySelector

ConfigMapKeySelector selects a specific key from a ConfigMap. Used to reference external configuration data stored in ConfigMaps.

Appears in:

Field Description Default Validation
name string Name of the ConfigMap containing the desired data. Required: {}
key string Key in the ConfigMap to select. If not specified, defaults to "disagg.yaml". disagg.yaml

DeploymentOverridesSpec

DeploymentOverridesSpec allows users to customize metadata for auto-created DynamoGraphDeployments. When autoApply is enabled, these overrides are applied to the generated DGD resource.

Appears in:

Field Description Default Validation
name string Name is the desired name for the created DynamoGraphDeployment.
If not specified, defaults to the DGDR name.
Optional: {}
namespace string Namespace is the desired namespace for the created DynamoGraphDeployment.
If not specified, defaults to the DGDR namespace.
Optional: {}
labels object (keys:string, values:string) Labels are additional labels to add to the DynamoGraphDeployment metadata.
These are merged with auto-generated labels from the profiling process.
Optional: {}
annotations object (keys:string, values:string) Annotations are additional annotations to add to the DynamoGraphDeployment metadata. Optional: {}
workersImage string WorkersImage specifies the container image to use for DynamoGraphDeployment worker components.
This image is used for both temporary DGDs created during online profiling and the final DGD.
If omitted, the image from the base config file (e.g., disagg.yaml) is used.
Example: "nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.1"
Optional: {}

DeploymentStatus

DeploymentStatus tracks the state of an auto-created DynamoGraphDeployment. This status is populated when autoApply is enabled and a DGD is created.

Appears in:

Field Description Default Validation
name string Name is the name of the created DynamoGraphDeployment.
namespace string Namespace is the namespace of the created DynamoGraphDeployment.
state string State is the current state of the DynamoGraphDeployment.
This value is mirrored from the DGD's status.state field.
created boolean Created indicates whether the DGD has been successfully created.
Used to prevent recreation if the DGD is manually deleted by users.

DynamoCheckpoint

DynamoCheckpoint is the Schema for the dynamocheckpoints API It represents a container checkpoint that can be used to restore pods to a warm state

Field Description Default Validation
apiVersion string nvidia.com/v1alpha1
kind string DynamoCheckpoint
metadata ObjectMeta Refer to Kubernetes API documentation for fields of metadata.
spec DynamoCheckpointSpec
status DynamoCheckpointStatus

DynamoCheckpointIdentity

DynamoCheckpointIdentity defines the inputs that determine checkpoint equivalence Two checkpoints with the same identity hash are considered equivalent

Appears in:

Field Description Default Validation
model string Model is the model identifier (e.g., "meta-llama/Llama-3-70B") Required: {}
backendFramework string BackendFramework is the runtime framework (vllm, sglang, trtllm) Enum: [vllm sglang trtllm]
Required: {}
dynamoVersion string DynamoVersion is the Dynamo platform version (optional)
If not specified, version is not included in identity hash
This ensures checkpoint compatibility across Dynamo releases
Optional: {}
tensorParallelSize integer TensorParallelSize is the tensor parallel configuration 1 Minimum: 1
Optional: {}
pipelineParallelSize integer PipelineParallelSize is the pipeline parallel configuration 1 Minimum: 1
Optional: {}
dtype string Dtype is the data type (fp16, bf16, fp8, etc.) Optional: {}
maxModelLen integer MaxModelLen is the maximum sequence length Minimum: 1
Optional: {}
extraParameters object (keys:string, values:string) ExtraParameters are additional parameters that affect the checkpoint hash
Use for any framework-specific or custom parameters not covered above
Optional: {}

DynamoCheckpointJobConfig

DynamoCheckpointJobConfig defines the configuration for the checkpoint creation Job

Appears in:

Field Description Default Validation
podTemplateSpec PodTemplateSpec PodTemplateSpec allows customizing the checkpoint Job pod
This should include the container that runs the workload to be checkpointed
Required: {}
activeDeadlineSeconds integer ActiveDeadlineSeconds specifies the maximum time the Job can run 3600 Optional: {}
backoffLimit integer BackoffLimit specifies the number of retries before marking the Job failed 3 Optional: {}
ttlSecondsAfterFinished integer TTLSecondsAfterFinished specifies how long to keep the Job after completion 300 Optional: {}

DynamoCheckpointPhase

Underlying type: string

DynamoCheckpointPhase represents the current phase of the checkpoint lifecycle

Validation:

  • Enum: [Pending Creating Ready Failed]

Appears in:

Field Description
Pending DynamoCheckpointPhasePending indicates the checkpoint CR has been created but the Job has not started
Creating DynamoCheckpointPhaseCreating indicates the checkpoint Job is running
Ready DynamoCheckpointPhaseReady indicates the checkpoint tar file is available on the PVC
Failed DynamoCheckpointPhaseFailed indicates the checkpoint creation failed

DynamoCheckpointSpec

DynamoCheckpointSpec defines the desired state of DynamoCheckpoint

Appears in:

Field Description Default Validation
identity DynamoCheckpointIdentity Identity defines the inputs that determine checkpoint equivalence Required: {}
job DynamoCheckpointJobConfig Job defines the configuration for the checkpoint creation Job Required: {}

DynamoCheckpointStatus

DynamoCheckpointStatus defines the observed state of DynamoCheckpoint

Appears in:

Field Description Default Validation
phase DynamoCheckpointPhase Phase represents the current phase of the checkpoint lifecycle Enum: [Pending Creating Ready Failed]
Optional: {}
identityHash string IdentityHash is the computed hash of the checkpoint identity
This hash is used to identify equivalent checkpoints
Optional: {}
location string Location is the full URI/path to the checkpoint in the storage backend
For PVC: same as TarPath (e.g., /checkpoints/{hash}.tar)
For S3: s3://bucket/prefix/{hash}.tar
For OCI: oci://registry/repo:{hash}
Optional: {}
storageType DynamoCheckpointStorageType StorageType indicates the storage backend type used for this checkpoint Enum: [pvc s3 oci]
Optional: {}
jobName string JobName is the name of the checkpoint creation Job Optional: {}
createdAt Time CreatedAt is the timestamp when the checkpoint tar was created Optional: {}
message string Message provides additional information about the current state Optional: {}
conditions Condition array Conditions represent the latest available observations of the checkpoint's state Optional: {}

DynamoCheckpointStorageType

Underlying type: string

DynamoCheckpointStorageType defines the supported storage backends for checkpoints

Validation:

  • Enum: [pvc s3 oci]

Appears in:

DynamoComponentDeployment

DynamoComponentDeployment is the Schema for the dynamocomponentdeployments API

Field Description Default Validation
apiVersion string nvidia.com/v1alpha1
kind string DynamoComponentDeployment
metadata ObjectMeta Refer to Kubernetes API documentation for fields of metadata.
spec DynamoComponentDeploymentSpec Spec defines the desired state for this Dynamo component deployment.

DynamoComponentDeploymentSharedSpec

Appears in:

Field Description Default Validation
annotations object (keys:string, values:string) Annotations to add to generated Kubernetes resources for this component
(such as Pod, Service, and Ingress when applicable).
labels object (keys:string, values:string) Labels to add to generated Kubernetes resources for this component.
serviceName string The name of the component
componentType string ComponentType indicates the role of this component (for example, "main").
subComponentType string SubComponentType indicates the sub-role of this component (for example, "prefill").
dynamoNamespace string DynamoNamespace is deprecated and will be removed in a future version.
The DGD Kubernetes namespace and DynamoGraphDeployment name are used to construct the Dynamo namespace for each component
Optional: {}
globalDynamoNamespace boolean GlobalDynamoNamespace indicates that the Component will be placed in the global Dynamo namespace
resources Resources Resources requested and limits for this component, including CPU, memory,
GPUs/devices, and any runtime-specific resources.
autoscaling Autoscaling Deprecated: This field is deprecated and ignored. Use DynamoGraphDeploymentScalingAdapter
with HPA, KEDA, or Planner for autoscaling instead. See docs/kubernetes/autoscaling.md
for migration guidance. This field will be removed in a future API version.
envs EnvVar array Envs defines additional environment variables to inject into the component containers.
envFromSecret string EnvFromSecret references a Secret whose key/value pairs will be exposed as
environment variables in the component containers.
volumeMounts VolumeMount array VolumeMounts references PVCs defined at the top level for volumes to be mounted by the component.
ingress IngressSpec Ingress config to expose the component outside the cluster (or through a service mesh).
modelRef ModelReference ModelRef references a model that this component serves
When specified, a headless service will be created for endpoint discovery
Optional: {}
sharedMemory SharedMemorySpec SharedMemory controls the tmpfs mounted at /dev/shm (enable/disable and size).
extraPodMetadata ExtraPodMetadata ExtraPodMetadata adds labels/annotations to the created Pods. Optional: {}
extraPodSpec ExtraPodSpec ExtraPodSpec allows to override the main pod spec configuration.
It is a k8s standard PodSpec. It also contains a MainContainer (standard k8s Container) field
that allows overriding the main container configuration.
Optional: {}
livenessProbe Probe LivenessProbe to detect and restart unhealthy containers.
readinessProbe Probe ReadinessProbe to signal when the container is ready to receive traffic.
replicas integer Replicas is the desired number of Pods for this component.
When scalingAdapter is enabled, this field is managed by the
DynamoGraphDeploymentScalingAdapter and should not be modified directly.
Minimum: 0
multinode MultinodeSpec Multinode is the configuration for multinode components.
scalingAdapter ScalingAdapter ScalingAdapter configures whether this service uses the DynamoGraphDeploymentScalingAdapter.
When enabled, replicas are managed via DGDSA and external autoscalers can scale
the service using the Scale subresource. When disabled, replicas can be modified directly.
Optional: {}
eppConfig EPPConfig EPPConfig defines EPP-specific configuration options for Endpoint Picker Plugin components.
Only applicable when ComponentType is "epp".
Optional: {}
checkpoint ServiceCheckpointConfig Checkpoint configures container checkpointing for this service.
When enabled, pods can be restored from a checkpoint files for faster cold start.
Optional: {}

DynamoComponentDeploymentSpec

DynamoComponentDeploymentSpec defines the desired state of DynamoComponentDeployment

Appears in:

Field Description Default Validation
backendFramework string BackendFramework specifies the backend framework (e.g., "sglang", "vllm", "trtllm") Enum: [sglang vllm trtllm]
annotations object (keys:string, values:string) Annotations to add to generated Kubernetes resources for this component
(such as Pod, Service, and Ingress when applicable).
labels object (keys:string, values:string) Labels to add to generated Kubernetes resources for this component.
serviceName string The name of the component
componentType string ComponentType indicates the role of this component (for example, "main").
subComponentType string SubComponentType indicates the sub-role of this component (for example, "prefill").
dynamoNamespace string DynamoNamespace is deprecated and will be removed in a future version.
The DGD Kubernetes namespace and DynamoGraphDeployment name are used to construct the Dynamo namespace for each component
Optional: {}
globalDynamoNamespace boolean GlobalDynamoNamespace indicates that the Component will be placed in the global Dynamo namespace
resources Resources Resources requested and limits for this component, including CPU, memory,
GPUs/devices, and any runtime-specific resources.
autoscaling Autoscaling Deprecated: This field is deprecated and ignored. Use DynamoGraphDeploymentScalingAdapter
with HPA, KEDA, or Planner for autoscaling instead. See docs/kubernetes/autoscaling.md
for migration guidance. This field will be removed in a future API version.
envs EnvVar array Envs defines additional environment variables to inject into the component containers.
envFromSecret string EnvFromSecret references a Secret whose key/value pairs will be exposed as
environment variables in the component containers.
volumeMounts VolumeMount array VolumeMounts references PVCs defined at the top level for volumes to be mounted by the component.
ingress IngressSpec Ingress config to expose the component outside the cluster (or through a service mesh).
modelRef ModelReference ModelRef references a model that this component serves
When specified, a headless service will be created for endpoint discovery
Optional: {}
sharedMemory SharedMemorySpec SharedMemory controls the tmpfs mounted at /dev/shm (enable/disable and size).
extraPodMetadata ExtraPodMetadata ExtraPodMetadata adds labels/annotations to the created Pods. Optional: {}
extraPodSpec ExtraPodSpec ExtraPodSpec allows to override the main pod spec configuration.
It is a k8s standard PodSpec. It also contains a MainContainer (standard k8s Container) field
that allows overriding the main container configuration.
Optional: {}
livenessProbe Probe LivenessProbe to detect and restart unhealthy containers.
readinessProbe Probe ReadinessProbe to signal when the container is ready to receive traffic.
replicas integer Replicas is the desired number of Pods for this component.
When scalingAdapter is enabled, this field is managed by the
DynamoGraphDeploymentScalingAdapter and should not be modified directly.
Minimum: 0
multinode MultinodeSpec Multinode is the configuration for multinode components.
scalingAdapter ScalingAdapter ScalingAdapter configures whether this service uses the DynamoGraphDeploymentScalingAdapter.
When enabled, replicas are managed via DGDSA and external autoscalers can scale
the service using the Scale subresource. When disabled, replicas can be modified directly.
Optional: {}
eppConfig EPPConfig EPPConfig defines EPP-specific configuration options for Endpoint Picker Plugin components.
Only applicable when ComponentType is "epp".
Optional: {}
checkpoint ServiceCheckpointConfig Checkpoint configures container checkpointing for this service.
When enabled, pods can be restored from a checkpoint files for faster cold start.
Optional: {}

DynamoGraphDeployment

DynamoGraphDeployment is the Schema for the dynamographdeployments API.

Field Description Default Validation
apiVersion string nvidia.com/v1alpha1
kind string DynamoGraphDeployment
metadata ObjectMeta Refer to Kubernetes API documentation for fields of metadata.
spec DynamoGraphDeploymentSpec Spec defines the desired state for this graph deployment.
status DynamoGraphDeploymentStatus Status reflects the current observed state of this graph deployment.

DynamoGraphDeploymentRequest

DynamoGraphDeploymentRequest is the Schema for the dynamographdeploymentrequests API. It serves as the primary interface for users to request model deployments with specific performance and resource constraints, enabling SLA-driven deployments.

Lifecycle:

  1. Initial → Pending: Validates spec and prepares for profiling
  2. Pending → Profiling: Creates and runs profiling job (online or AIC)
  3. Profiling → Ready/Deploying: Generates DGD spec after profiling completes
  4. Deploying → Ready: When autoApply=true, monitors DGD until Ready
  5. Ready: Terminal state when DGD is operational or spec is available
  6. DeploymentDeleted: Terminal state when auto-created DGD is manually deleted

The spec becomes immutable once profiling starts. Users must delete and recreate the DGDR to modify configuration after this point.

Field Description Default Validation
apiVersion string nvidia.com/v1alpha1
kind string DynamoGraphDeploymentRequest
metadata ObjectMeta Refer to Kubernetes API documentation for fields of metadata.
spec DynamoGraphDeploymentRequestSpec Spec defines the desired state for this deployment request.
status DynamoGraphDeploymentRequestStatus Status reflects the current observed state of this deployment request.

DynamoGraphDeploymentRequestSpec

DynamoGraphDeploymentRequestSpec defines the desired state of a DynamoGraphDeploymentRequest. This CRD serves as the primary interface for users to request model deployments with specific performance constraints and resource requirements, enabling SLA-driven deployments.

Appears in:

Field Description Default Validation
model string Model specifies the model to deploy (e.g., "Qwen/Qwen3-0.6B", "meta-llama/Llama-3-70b").
This is a high-level identifier for easy reference in kubectl output and logs.
The controller automatically sets this value in profilingConfig.config.deployment.model.
Required: {}
backend string Backend specifies the inference backend for profiling.
The controller automatically sets this value in profilingConfig.config.engine.backend.
Profiling runs on real GPUs or via AIC simulation to collect performance data.
Enum: [vllm sglang trtllm]
Required: {}
useMocker boolean UseMocker indicates whether to deploy a mocker DynamoGraphDeployment instead of
a real backend deployment. When true, the deployment uses simulated engines that
don't require GPUs, using the profiling data to simulate realistic timing behavior.
Mocker is available in all backend images and useful for large-scale experiments.
Profiling still runs against the real backend (specified above) to collect performance data.
false
enableGpuDiscovery boolean EnableGpuDiscovery controls whether the profiler should automatically discover GPU
resources from the Kubernetes cluster nodes. When enabled, the profiler will override
any manually specified hardware configuration (minNumGpusPerEngine, maxNumGpusPerEngine,
numGpusPerNode) with values detected from the cluster.
Requires cluster-wide node access permissions - only available with cluster-scoped operators.
false Optional: {}
profilingConfig ProfilingConfigSpec ProfilingConfig provides the complete configuration for the profiling job.
This configuration is passed directly to the profiler.
The structure matches the profile_sla config format exactly (see ProfilingConfigSpec for schema).
Note: deployment.model and engine.backend are automatically set from the high-level
modelName and backend fields and should not be specified in this config.
Required: {}
autoApply boolean AutoApply indicates whether to automatically create a DynamoGraphDeployment
after profiling completes. If false, only the spec is generated and stored in status.
Users can then manually create a DGD using the generated spec.
false
deploymentOverrides DeploymentOverridesSpec DeploymentOverrides allows customizing metadata for the auto-created DGD.
Only applicable when AutoApply is true.
Optional: {}

DynamoGraphDeploymentRequestStatus

DynamoGraphDeploymentRequestStatus represents the observed state of a DynamoGraphDeploymentRequest. The controller updates this status as the DGDR progresses through its lifecycle.

Appears in:

Field Description Default Validation
state string State is a high-level textual status of the deployment request lifecycle.
Possible values: "", "Pending", "Profiling", "Deploying", "Ready", "DeploymentDeleted", "Failed"
Empty string ("") represents the initial state before initialization.
backend string Backend is extracted from profilingConfig.config.engine.backend for display purposes.
This field is populated by the controller and shown in kubectl output.
Optional: {}
observedGeneration integer ObservedGeneration reflects the generation of the most recently observed spec.
Used to detect spec changes and enforce immutability after profiling starts.
conditions Condition array Conditions contains the latest observed conditions of the deployment request.
Standard condition types include: Validation, Profiling, SpecGenerated, DeploymentReady.
Conditions are merged by type on patch updates.
profilingResults string ProfilingResults contains a reference to the ConfigMap holding profiling data.
Format: "configmap/"
Optional: {}
generatedDeployment RawExtension GeneratedDeployment contains the full generated DynamoGraphDeployment specification
including metadata, based on profiling results. Users can extract this to create
a DGD manually, or it's used automatically when autoApply is true.
Stored as RawExtension to preserve all fields including metadata.
For mocker backends, this contains the mocker DGD spec.
EmbeddedResource: {}
Optional: {}
deployment DeploymentStatus Deployment tracks the auto-created DGD when AutoApply is true.
Contains name, namespace, state, and creation status of the managed DGD.
Optional: {}

DynamoGraphDeploymentScalingAdapter

DynamoGraphDeploymentScalingAdapter provides a scaling interface for individual services within a DynamoGraphDeployment. It implements the Kubernetes scale subresource, enabling integration with HPA, KEDA, and custom autoscalers.

The adapter acts as an intermediary between autoscalers and the DGD, ensuring that only the adapter controller modifies the DGD's service replicas. This prevents conflicts when multiple autoscaling mechanisms are in play.

Field Description Default Validation
apiVersion string nvidia.com/v1alpha1
kind string DynamoGraphDeploymentScalingAdapter
metadata ObjectMeta Refer to Kubernetes API documentation for fields of metadata.
spec DynamoGraphDeploymentScalingAdapterSpec
status DynamoGraphDeploymentScalingAdapterStatus

DynamoGraphDeploymentScalingAdapterSpec

DynamoGraphDeploymentScalingAdapterSpec defines the desired state of DynamoGraphDeploymentScalingAdapter

Appears in:

Field Description Default Validation
replicas integer Replicas is the desired number of replicas for the target service.
This field is modified by external autoscalers (HPA/KEDA/Planner) or manually by users.
Minimum: 0
Required: {}
dgdRef DynamoGraphDeploymentServiceRef DGDRef references the DynamoGraphDeployment and the specific service to scale. Required: {}

DynamoGraphDeploymentScalingAdapterStatus

DynamoGraphDeploymentScalingAdapterStatus defines the observed state of DynamoGraphDeploymentScalingAdapter

Appears in:

Field Description Default Validation
replicas integer Replicas is the current number of replicas for the target service.
This is synced from the DGD's service replicas and is required for the scale subresource.
Optional: {}
selector string Selector is a label selector string for the pods managed by this adapter.
Required for HPA compatibility via the scale subresource.
Optional: {}
lastScaleTime Time LastScaleTime is the last time the adapter scaled the target service. Optional: {}

DynamoGraphDeploymentServiceRef

DynamoGraphDeploymentServiceRef identifies a specific service within a DynamoGraphDeployment

Appears in:

Field Description Default Validation
name string Name of the DynamoGraphDeployment MinLength: 1
Required: {}
serviceName string ServiceName is the key name of the service within the DGD's spec.services map to scale MinLength: 1
Required: {}

DynamoGraphDeploymentSpec

DynamoGraphDeploymentSpec defines the desired state of DynamoGraphDeployment.

Appears in:

Field Description Default Validation
pvcs PVC array PVCs defines a list of persistent volume claims that can be referenced by components.
Each PVC must have a unique name that can be referenced in component specifications.
MaxItems: 100
Optional: {}
services object (keys:string, values:DynamoComponentDeploymentSharedSpec) Services are the services to deploy as part of this deployment. MaxProperties: 25
Optional: {}
envs EnvVar array Envs are environment variables applied to all services in the deployment unless
overridden by service-specific configuration.
Optional: {}
backendFramework string BackendFramework specifies the backend framework (e.g., "sglang", "vllm", "trtllm"). Enum: [sglang vllm trtllm]
restart Restart Restart specifies the restart policy for the graph deployment. Optional: {}

DynamoGraphDeploymentStatus

DynamoGraphDeploymentStatus defines the observed state of DynamoGraphDeployment.

Appears in:

Field Description Default Validation
state string State is a high-level textual status of the graph deployment lifecycle.
conditions Condition array Conditions contains the latest observed conditions of the graph deployment.
The slice is merged by type on patch updates.
services object (keys:string, values:ServiceReplicaStatus) Services contains per-service replica status information.
The map key is the service name from spec.services.
Optional: {}
restart RestartStatus Restart contains the status of the restart of the graph deployment. Optional: {}
checkpoints object (keys:string, values:ServiceCheckpointStatus) Checkpoints contains per-service checkpoint status information.
The map key is the service name from spec.services.
Optional: {}

DynamoModel

DynamoModel is the Schema for the dynamo models API

Field Description Default Validation
apiVersion string nvidia.com/v1alpha1
kind string DynamoModel
metadata ObjectMeta Refer to Kubernetes API documentation for fields of metadata.
spec DynamoModelSpec
status DynamoModelStatus

DynamoModelSpec

DynamoModelSpec defines the desired state of DynamoModel

Appears in:

Field Description Default Validation
modelName string ModelName is the full model identifier (e.g., "meta-llama/Llama-3.3-70B-Instruct-lora") Required: {}
baseModelName string BaseModelName is the base model identifier that matches the service label
This is used to discover endpoints via headless services
Required: {}
modelType string ModelType specifies the type of model (e.g., "base", "lora", "adapter") base Enum: [base lora adapter]
Optional: {}
source ModelSource Source specifies the model source location (only applicable for lora model type) Optional: {}

DynamoModelStatus

DynamoModelStatus defines the observed state of DynamoModel

Appears in:

Field Description Default Validation
endpoints EndpointInfo array Endpoints is the current list of all endpoints for this model Optional: {}
readyEndpoints integer ReadyEndpoints is the count of endpoints that are ready
totalEndpoints integer TotalEndpoints is the total count of endpoints
conditions Condition array Conditions represents the latest available observations of the model's state Optional: {}

EPPConfig

EPPConfig contains configuration for EPP (Endpoint Picker Plugin) components. EPP is responsible for intelligent endpoint selection and KV-aware routing.

Appears in:

Field Description Default Validation
configMapRef ConfigMapKeySelector ConfigMapRef references a user-provided ConfigMap containing EPP configuration.
The ConfigMap should contain EndpointPickerConfig YAML.
Mutually exclusive with Config.
Optional: {}
config EndpointPickerConfig Config allows specifying EPP EndpointPickerConfig directly as a structured object.
The operator will marshal this to YAML and create a ConfigMap automatically.
Mutually exclusive with ConfigMapRef.
One of ConfigMapRef or Config must be specified (no default configuration).
Uses the upstream type from github.com/kubernetes-sigs/gateway-api-inference-extension
Type: object
Optional: {}

EndpointInfo

EndpointInfo represents a single endpoint (pod) serving the model

Appears in:

Field Description Default Validation
address string Address is the full address of the endpoint (e.g., "http://10.0.1.5:9090")
podName string PodName is the name of the pod serving this endpoint Optional: {}
ready boolean Ready indicates whether the endpoint is ready to serve traffic
For LoRA models: true if the POST /loras request succeeded with a 2xx status code
For base models: always false (no probing performed)

ExtraPodMetadata

Appears in:

Field Description Default Validation
annotations object (keys:string, values:string)
labels object (keys:string, values:string)

ExtraPodSpec

Appears in:

Field Description Default Validation
mainContainer Container

IngressSpec

Appears in:

Field Description Default Validation
enabled boolean Enabled exposes the component through an ingress or virtual service when true.
host string Host is the base host name to route external traffic to this component.
useVirtualService boolean UseVirtualService indicates whether to configure a service-mesh VirtualService instead of a standard Ingress.
virtualServiceGateway string VirtualServiceGateway optionally specifies the gateway name to attach the VirtualService to.
hostPrefix string HostPrefix is an optional prefix added before the host.
annotations object (keys:string, values:string) Annotations to set on the generated Ingress/VirtualService resources.
labels object (keys:string, values:string) Labels to set on the generated Ingress/VirtualService resources.
tls IngressTLSSpec TLS holds the TLS configuration used by the Ingress/VirtualService.
hostSuffix string HostSuffix is an optional suffix appended after the host.
ingressControllerClassName string IngressControllerClassName selects the ingress controller class (e.g., "nginx").

IngressTLSSpec

Appears in:

Field Description Default Validation
secretName string SecretName is the name of a Kubernetes Secret containing the TLS certificate and key.

ModelReference

ModelReference identifies a model served by this component

Appears in:

Field Description Default Validation
name string Name is the base model identifier (e.g., "llama-3-70b-instruct-v1") Required: {}
revision string Revision is the model revision/version (optional) Optional: {}

ModelSource

ModelSource defines the source location of a model

Appears in:

Field Description Default Validation
uri string URI is the model source URI
Supported formats:
- S3: s3://bucket/path/to/model
- HuggingFace: hf://org/model@revision_sha
Required: {}

MultinodeSpec

Appears in:

Field Description Default Validation
nodeCount integer Indicates the number of nodes to deploy for multinode components.
Total number of GPUs is NumberOfNodes * GPU limit.
Must be greater than 1.
2 Minimum: 2

PVC

Appears in:

Field Description Default Validation
create boolean Create indicates to create a new PVC
name string Name is the name of the PVC Required: {}
storageClass string StorageClass to be used for PVC creation. Required when create is true.
size Quantity Size of the volume in Gi, used during PVC creation. Required when create is true.
volumeAccessMode PersistentVolumeAccessMode VolumeAccessMode is the volume access mode of the PVC. Required when create is true.

ProfilingConfigSpec

ProfilingConfigSpec defines configuration for the profiling process. This structure maps directly to the profile_sla.py config format. See benchmarks/profiler/utils/profiler_argparse.py for the complete schema.

Appears in:

Field Description Default Validation
config JSON Config is the profiling configuration as arbitrary JSON/YAML. This will be passed directly to the profiler.
The profiler will validate the configuration and report any errors.
Optional: {}
Type: object
configMapRef ConfigMapKeySelector ConfigMapRef is an optional reference to a ConfigMap containing the DynamoGraphDeployment
base config file (disagg.yaml). This is separate from the profiling config above.
The path to this config will be set as engine.config in the profiling config.
Optional: {}
profilerImage string ProfilerImage specifies the container image to use for profiling jobs.
This image contains the profiler code and dependencies needed for SLA-based profiling.
Example: "nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.1"
Required: {}
outputPVC string OutputPVC is an optional PersistentVolumeClaim name for storing profiling output.
If specified, all profiling artifacts (logs, plots, configs, raw data) will be written
to this PVC instead of an ephemeral emptyDir volume. This allows users to access
complete profiling results after the job completes by mounting the PVC.
The PVC must exist in the same namespace as the DGDR.
If not specified, profiling uses emptyDir and only essential data is saved to ConfigMaps.
Note: ConfigMaps are still created regardless of this setting for planner integration.
Optional: {}
resources ResourceRequirements Resources specifies the compute resource requirements for the profiling job container.
If not specified, no resource requests or limits are set.
Optional: {}
tolerations Toleration array Tolerations allows the profiling job to be scheduled on nodes with matching taints.
For example, to schedule on GPU nodes, add a toleration for the nvidia.com/gpu taint.
Optional: {}

ResourceItem

Appears in:

Field Description Default Validation
cpu string CPU specifies the CPU resource request/limit (e.g., "1000m", "2")
memory string Memory specifies the memory resource request/limit (e.g., "4Gi", "8Gi")
gpu string GPU indicates the number of GPUs to request.
Total number of GPUs is NumberOfNodes * GPU in case of multinode deployment.
gpuType string GPUType can specify a custom GPU type, e.g. "gpu.intel.com/xe"
By default if not specified, the GPU type is "nvidia.com/gpu"
custom object (keys:string, values:string) Custom specifies additional custom resource requests/limits

Resources

Resources defines requested and limits for a component, including CPU, memory, GPUs/devices, and any runtime-specific resources.

Appears in:

Field Description Default Validation
requests ResourceItem Requests specifies the minimum resources required by the component
limits ResourceItem Limits specifies the maximum resources allowed for the component
claims ResourceClaim array Claims specifies resource claims for dynamic resource allocation

Restart

Appears in:

Field Description Default Validation
id string ID is an arbitrary string that triggers a restart when changed.
Any modification to this value will initiate a restart of the graph deployment according to the strategy.
MinLength: 1
Required: {}
strategy RestartStrategy Strategy specifies the restart strategy for the graph deployment. Optional: {}

RestartPhase

Underlying type: string

Appears in:

Field Description
Pending
Restarting
Completed
Failed

RestartStatus

RestartStatus contains the status of the restart of the graph deployment.

Appears in:

Field Description Default Validation
observedID string ObservedID is the restart ID that has been observed and is being processed.
Matches the Restart.ID field in the spec.
phase RestartPhase Phase is the phase of the restart.
inProgress string array InProgress contains the names of the services that are currently being restarted. Optional: {}

RestartStrategy

Appears in:

Field Description Default Validation
type RestartStrategyType Type specifies the restart strategy type. Sequential Enum: [Sequential Parallel]
order string array Order specifies the order in which the services should be restarted. Optional: {}

RestartStrategyType

Underlying type: string

Appears in:

Field Description
Sequential
Parallel

ScalingAdapter

ScalingAdapter configures whether a service uses the DynamoGraphDeploymentScalingAdapter for replica management. When enabled, the DGDSA owns the replicas field and external autoscalers (HPA, KEDA, Planner) can control scaling via the Scale subresource.

Appears in:

Field Description Default Validation
enabled boolean Enabled indicates whether the ScalingAdapter should be enabled for this service.
When true, a DGDSA is created and owns the replicas field.
When false (default), no DGDSA is created and replicas can be modified directly in the DGD.
false Optional: {}

ServiceCheckpointConfig

ServiceCheckpointConfig configures checkpointing for a DGD service

Appears in:

Field Description Default Validation
enabled boolean Enabled indicates whether checkpointing is enabled for this service false Optional: {}
mode CheckpointMode Mode defines how checkpoint creation is handled
- Auto: DGD controller creates Checkpoint CR automatically
- Manual: User must create Checkpoint CR
Auto Enum: [Auto Manual]
Optional: {}
checkpointRef string CheckpointRef references an existing Checkpoint CR to use
If specified, Identity is ignored and this checkpoint is used directly
Optional: {}
identity DynamoCheckpointIdentity Identity defines the checkpoint identity for hash computation
Used when Mode is Auto or when looking up existing checkpoints
Required when checkpointRef is not specified
Optional: {}

ServiceCheckpointStatus

ServiceCheckpointStatus contains checkpoint information for a single service.

Appears in:

Field Description Default Validation
checkpointName string CheckpointName is the name of the associated Checkpoint CR Optional: {}
identityHash string IdentityHash is the computed hash of the checkpoint identity Optional: {}
ready boolean Ready indicates if the checkpoint is ready for use Optional: {}

ServiceReplicaStatus

ServiceReplicaStatus contains replica information for a single service.

Appears in:

Field Description Default Validation
componentKind ComponentKind ComponentKind is the underlying resource kind (e.g., "PodClique", "PodCliqueScalingGroup", "Deployment", "LeaderWorkerSet"). Enum: [PodClique PodCliqueScalingGroup Deployment LeaderWorkerSet]
componentName string ComponentName is the name of the underlying resource.
replicas integer Replicas is the total number of non-terminated replicas.
Required for all component kinds.
Minimum: 0
updatedReplicas integer UpdatedReplicas is the number of replicas at the current/desired revision.
Required for all component kinds.
Minimum: 0
readyReplicas integer ReadyReplicas is the number of ready replicas.
Populated for PodClique, Deployment, and LeaderWorkerSet.
Not available for PodCliqueScalingGroup.
When nil, the field is omitted from the API response.
Minimum: 0
Optional: {}
availableReplicas integer AvailableReplicas is the number of available replicas.
For Deployment: replicas ready for >= minReadySeconds.
For PodCliqueScalingGroup: replicas where all constituent PodCliques have >= MinAvailable ready pods.
Not available for PodClique or LeaderWorkerSet.
When nil, the field is omitted from the API response.
Minimum: 0
Optional: {}

SharedMemorySpec

Appears in:

Field Description Default Validation
disabled boolean
size Quantity

VolumeMount

VolumeMount references a PVC defined at the top level for volumes to be mounted by the component

Appears in:

Field Description Default Validation
name string Name references a PVC name defined in the top-level PVCs map Required: {}
mountPoint string MountPoint specifies where to mount the volume.
If useAsCompilationCache is true and mountPoint is not specified,
a backend-specific default will be used.
useAsCompilationCache boolean UseAsCompilationCache indicates this volume should be used as a compilation cache.
When true, backend-specific environment variables will be set and default mount points may be used.
false

Operator Default Values Injection

The Dynamo operator automatically applies default values to various fields when they are not explicitly specified in your deployments. These defaults include:

  • Health Probes: Startup, liveness, and readiness probes are configured differently for frontend, worker, and planner components. For example, worker components receive a startup probe with a 2-hour timeout (720 failures × 10 seconds) to accommodate long model loading times.

  • Security Context: All components receive fsGroup: 1000 by default to ensure proper file permissions for mounted volumes. This can be overridden via the extraPodSpec.securityContext field.

  • Shared Memory: All components receive an 8Gi shared memory volume mounted at /dev/shm by default (can be disabled or resized via the sharedMemory field).

  • Environment Variables: Components automatically receive environment variables like DYN_NAMESPACE, DYN_PARENT_DGD_K8S_NAME, DYNAMO_PORT, and backend-specific variables.

  • Pod Configuration: Default terminationGracePeriodSeconds of 60 seconds and restartPolicy: Always.

  • Autoscaling: When enabled without explicit metrics, defaults to CPU-based autoscaling with 80% target utilization.

  • Backend-Specific Behavior: For multinode deployments, probes are automatically modified or removed for worker nodes depending on the backend framework (VLLM, SGLang, or TensorRT-LLM).

Pod Specification Defaults

All components receive the following pod-level defaults unless overridden:

  • terminationGracePeriodSeconds: 60 seconds
  • restartPolicy: Always

Security Context

The operator automatically applies default security context settings to all components to ensure proper file permissions, particularly for mounted volumes:

  • fsGroup: 1000 - Sets the group ownership of mounted volumes and any files created in those volumes

This default ensures that non-root containers can write to mounted volumes (like model caches or persistent storage) without permission issues. The fsGroup setting is particularly important for:

  • Model downloads and caching
  • Compilation cache directories
  • Persistent volume claims (PVCs)
  • SSH key generation in multinode deployments

Overriding Security Context

To override the default security context, specify your own securityContext in the extraPodSpec of your component:

services:
  YourWorker:
    extraPodSpec:
      securityContext:
        fsGroup: 2000  # Custom group ID
        runAsUser: 1000
        runAsGroup: 1000
        runAsNonRoot: true

Important: When you provide any securityContext object in extraPodSpec, the operator will not inject any defaults. This gives you complete control over the security context, including the ability to run as root (by omitting runAsNonRoot or setting it to false).

OpenShift and Security Context Constraints

In OpenShift environments with Security Context Constraints (SCCs), you may need to omit explicit UID/GID values to allow OpenShift's admission controllers to assign them dynamically:

services:
  YourWorker:
    extraPodSpec:
      securityContext:
        # Omit fsGroup to let OpenShift assign it based on SCC
        # OpenShift will inject the appropriate UID range

Alternatively, if you want to keep the default fsGroup: 1000 behavior and are certain your cluster allows it, you don't need to specify anything - the operator defaults will work.

Shared Memory Configuration

Shared memory is enabled by default for all components:

  • Enabled: true (unless explicitly disabled via sharedMemory.disabled)
  • Size: 8Gi
  • Mount Path: /dev/shm
  • Volume Type: emptyDir with memory medium

To disable shared memory or customize the size, use the sharedMemory field in your component specification.

Health Probes by Component Type

The operator applies different default health probes based on the component type.

Frontend Components

Frontend components receive the following probe configurations:

Liveness Probe:

  • Type: HTTP GET
  • Path: /health
  • Port: http (8000)
  • Initial Delay: 60 seconds
  • Period: 60 seconds
  • Timeout: 30 seconds
  • Failure Threshold: 10

Readiness Probe:

  • Type: Exec command
  • Command: curl -s http://localhost:${DYNAMO_PORT}/health | jq -e ".status == \"healthy\""
  • Initial Delay: 60 seconds
  • Period: 60 seconds
  • Timeout: 30 seconds
  • Failure Threshold: 10

Worker Components

Worker components receive the following probe configurations:

Liveness Probe:

  • Type: HTTP GET
  • Path: /live
  • Port: system (9090)
  • Period: 5 seconds
  • Timeout: 30 seconds
  • Failure Threshold: 1

Readiness Probe:

  • Type: HTTP GET
  • Path: /health
  • Port: system (9090)
  • Period: 10 seconds
  • Timeout: 30 seconds
  • Failure Threshold: 60

Startup Probe:

  • Type: HTTP GET
  • Path: /live
  • Port: system (9090)
  • Period: 10 seconds
  • Timeout: 5 seconds
  • Failure Threshold: 720 (allows up to 2 hours for startup: 10s × 720 = 7200s)

:::{note} For larger models (typically >70B parameters) or slower storage systems, you may need to increase the failureThreshold to allow more time for model loading. Calculate the required threshold based on your expected startup time: failureThreshold = (expected_startup_seconds / period). Override the startup probe in your component specification if the default 2-hour window is insufficient. :::

Multinode Deployment Probe Modifications

For multinode deployments, the operator modifies probes based on the backend framework and node role:

VLLM Backend

The operator automatically selects between two deployment modes based on parallelism configuration:

Tensor/Pipeline Parallel Mode (when world_size > GPUs_per_node):

  • Uses Ray for distributed execution (--distributed-executor-backend ray)
  • Leader nodes: Starts Ray head and runs vLLM; all probes remain active
  • Worker nodes: Run Ray agents only; all probes (liveness, readiness, startup) are removed

Data Parallel Mode (when world_size × data_parallel_size > GPUs_per_node):

  • Worker nodes: All probes (liveness, readiness, startup) are removed
  • Leader nodes: All probes remain active

SGLang Backend

  • Worker nodes: All probes (liveness, readiness, startup) are removed

TensorRT-LLM Backend

  • Leader nodes: All probes remain unchanged
  • Worker nodes:
    • Liveness and startup probes are removed
    • Readiness probe is replaced with a TCP socket check on SSH port (2222):
      • Initial Delay: 20 seconds
      • Period: 20 seconds
      • Timeout: 5 seconds
      • Failure Threshold: 10

Environment Variables

The operator automatically injects environment variables based on component type and configuration:

All Components

  • DYN_NAMESPACE: The Dynamo namespace for the component
  • DYN_PARENT_DGD_K8S_NAME: The parent DynamoGraphDeployment Kubernetes resource name
  • DYN_PARENT_DGD_K8S_NAMESPACE: The parent DynamoGraphDeployment Kubernetes namespace

Frontend Components

  • DYNAMO_PORT: 8000
  • DYN_HTTP_PORT: 8000

Worker Components

  • DYN_SYSTEM_PORT: 9090 (automatically enables the system metrics server)
  • DYN_SYSTEM_USE_ENDPOINT_HEALTH_STATUS: ["generate"]
  • DYN_SYSTEM_ENABLED: true (needed for runtime images 0.6.1 and older)

Planner Components

  • PLANNER_PROMETHEUS_PORT: 9085

VLLM Backend (with compilation cache)

When a volume mount is configured with useAsCompilationCache: true:

  • VLLM_CACHE_ROOT: Set to the mount point of the cache volume

Service Account

Planner components automatically receive the following service account:

  • serviceAccountName: planner-serviceaccount

Image Pull Secrets

The operator automatically discovers and injects image pull secrets for container images. When a component specifies a container image, the operator:

  1. Scans all Kubernetes secrets of type kubernetes.io/dockerconfigjson in the component's namespace
  2. Extracts the docker registry server URLs from each secret's authentication configuration
  3. Matches the container image's registry host against the discovered registry URLs
  4. Automatically injects matching secrets as imagePullSecrets in the pod specification

This eliminates the need to manually specify image pull secrets for each component. The operator maintains an internal index of docker secrets and their associated registries, refreshing this index periodically.

To disable automatic image pull secret discovery for a specific component, add the following annotation:

annotations:
  nvidia.com/disable-image-pull-secret-discovery: "true"

Autoscaling Defaults

When autoscaling is enabled but no metrics are specified, the operator applies:

  • Default Metric: CPU utilization
  • Target Average Utilization: 80%

Port Configurations

Default container ports are configured based on component type:

Frontend Components

  • Port: 8000
  • Protocol: TCP
  • Name: http

Worker Components

  • Port: 9090
  • Protocol: TCP
  • Name: system

Planner Components

  • Port: 9085
  • Protocol: TCP
  • Name: metrics

Backend-Specific Configurations

VLLM

  • Ray Head Port: 6379 (for Ray cluster coordination in multinode TP/PP deployments)
  • Data Parallel RPC Port: 13445 (for data parallel multinode deployments)

SGLang

  • Distribution Init Port: 29500 (for multinode deployments)

TensorRT-LLM

  • SSH Port: 2222 (for multinode MPI communication)
  • OpenMPI Environment: OMPI_MCA_orte_keep_fqdn_hostnames=1

Implementation Reference

For users who want to understand the implementation details or contribute to the operator, the default values described in this document are set in the following source files:

Notes

  • All these defaults can be overridden by explicitly specifying values in your DynamoComponentDeployment or DynamoGraphDeployment resources
  • User-specified probes (via livenessProbe, readinessProbe, or startupProbe fields) take precedence over operator defaults
  • For security context, if you provide any securityContext in extraPodSpec, no defaults will be injected, giving you full control
  • For multinode deployments, some defaults are modified or removed as described above to accommodate distributed execution patterns
  • The extraPodSpec.mainContainer field can be used to override probe configurations set by the operator