Skip to content

Conversation

@julienmancuso
Copy link
Contributor

@julienmancuso julienmancuso commented Nov 6, 2025

Overview:

add dynamoModel CRD

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced DynamoModel custom resource for managing model deployments
    • Added ModelRef field to DynamoComponentDeployment and DynamoGraphDeployment for model linking
    • Automatic headless service creation for model endpoint discovery
    • Support for LoRA model loading and management across endpoints
  • Chores

    • Updated RBAC permissions for model resource management and service discovery
    • Enhanced controller infrastructure with model lifecycle capabilities

Example of a DGD and associated new DynamoModel CR :

apiVersion: nvidia.com/v1alpha1
kind: DynamoGraphDeployment
metadata:
  name: sglang-disagg
spec:
  services:
    Frontend:
      dynamoNamespace: sglang-disagg
      componentType: frontend
      replicas: 1
      extraPodSpec:
        mainContainer:
          image: nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.6.0
    decode:
      modelRef:
        name: Qwen/Qwen3-0.6B
      envFromSecret: hf-token-secret
      dynamoNamespace: sglang-disagg
      componentType: worker
      subComponentType: decode
      replicas: 1
      resources:
        limits:
          gpu: "1"
      extraPodSpec:
        mainContainer:
          image: nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.6.0
          workingDir: /workspace/examples/backends/sglang
          command:
          - python3
          - -m
          - dynamo.sglang
          args:
            - --model-path
            - Qwen/Qwen3-0.6B
            - --served-model-name
            - Qwen/Qwen3-0.6B
            - --page-size
            - "16"
            - --tp
            - "1"
            - --trust-remote-code
            - --skip-tokenizer-init
            - --disaggregation-mode
            - decode
            - --disaggregation-transfer-backend
            - nixl
            - --disaggregation-bootstrap-port
            - "12345"
            - --host
            - "0.0.0.0"
    prefill:
      modelRef:
        name: Qwen/Qwen3-0.6B
      envFromSecret: hf-token-secret
      dynamoNamespace: sglang-disagg
      componentType: worker
      subComponentType: prefill
      replicas: 1
      resources:
        limits:
          gpu: "1"
      extraPodSpec:
        mainContainer:
          image: nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.6.0
          workingDir: /workspace/examples/backends/sglang
          command:
          - python3
          - -m
          - dynamo.sglang
          args:
            - --model-path
            - Qwen/Qwen3-0.6B
            - --served-model-name
            - Qwen/Qwen3-0.6B
            - --page-size
            - "16"
            - --tp
            - "1"
            - --trust-remote-code
            - --skip-tokenizer-init
            - --disaggregation-mode
            - prefill
            - --disaggregation-transfer-backend
            - nixl
            - --disaggregation-bootstrap-port
            - "12345"
            - --host
            - "0.0.0.0"
---
# Example DynamoModel CR - Base Model
apiVersion: nvidia.com/v1alpha1
kind: DynamoModel
metadata:
  name: sglang-3-0-6b-my-lora
spec:
  modelName: Qwen/Qwen3-0.6B-my-lora
  baseModelName: Qwen/Qwen3-0.6B
  modelType: lora
  source:
    uri: s3://my-bucket/Qwen/Qwen3-0.6B-my-lora

the new controller would make sure the workers of the DGD (both decode and worker) would have the LORA loaded by calling their POST /v1/loras API.

internally we use headless service and associated endpointSlices to make sure the LORA are loaded

Signed-off-by: Julien Mancuso <[email protected]>
Signed-off-by: Julien Mancuso <[email protected]>
Signed-off-by: Julien Mancuso <[email protected]>
@julienmancuso julienmancuso requested a review from a team as a code owner November 6, 2025 21:19
@github-actions github-actions bot added the feat label Nov 6, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 6, 2025

Walkthrough

Introduces a new DynamoModel Kubernetes custom resource definition with associated API types, controller, and infrastructure for managing model endpoint discovery and LoRA loading. Adds ModelRef fields to existing CRDs for model association. Includes endpoint discovery utilities, a bounded-concurrency LoRA client, headless service generation, and updates to existing controllers and RBAC rules.

Changes

Cohort / File(s) Summary
New DynamoModel CRD Definitions
deploy/cloud/helm/crds/templates/nvidia.com_dynamomodels.yaml, deploy/cloud/operator/config/crd/bases/nvidia.com_dynamomodels.yaml
Introduces CustomResourceDefinition for DynamoModel resource with spec fields (baseModelName, modelName, modelType enum, loraPath), status fields (conditions, endpoints, ready/total counters), and printer columns (BaseModel, Type, Ready, Total, Age).
ModelRef Field Additions to Existing CRDs
deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml, deploy/cloud/helm/crds/templates/nvidia.com_dynamographdeployments.yaml, deploy/cloud/operator/config/crd/bases/nvidia.com_dynamocomponentdeployments.yaml, deploy/cloud/operator/config/crd/bases/nvidia.com_dynamographdeployments.yaml
Adds optional modelRef field (object with required name and optional revision) to component deployment specs for model association and headless service endpoint discovery.
API Type Definitions
deploy/cloud/operator/api/v1alpha1/dynamo_model_types.go, deploy/cloud/operator/api/v1alpha1/dynamocomponentdeployment_types.go, deploy/cloud/operator/api/v1alpha1/zz_generated.deepcopy.go
Introduces DynamoModel, DynamoModelSpec, DynamoModelStatus, EndpointInfo, ModelSource, ModelReference types with Kubebuilder markers; adds ModelReference to component spec; generates DeepCopy methods and updates imports for core/v1 aliasing.
DynamoModel Controller
deploy/cloud/operator/internal/controller/dynamo_model_controller.go
Implements DynamoModelReconciler with reconciliation loop handling finalizers, EndpointSlice discovery, parallel LoRA loading, status/condition updates, and lifecycle management via EndpointClient integration.
Existing Controller Updates
deploy/cloud/operator/internal/controller/dynamocomponentdeployment_controller.go, deploy/cloud/operator/internal/controller/dynamographdeployment_controller.go, deploy/cloud/operator/internal/dynamo/graph.go
Integrates headless model service reconciliation and base model label generation into component and graph deployment reconciliation paths.
Endpoint Management Client
deploy/cloud/operator/internal/modelendpoint/client.go, deploy/cloud/operator/internal/modelendpoint/lora.go, deploy/cloud/operator/internal/modelendpoint/discovery.go, deploy/cloud/operator/internal/modelendpoint/types.go
Introduces Client for bounded-concurrency LoRA loading/unloading with timeout controls; adds endpoint candidate extraction and model discovery query utilities; defines Candidate type.
Headless Service Generation
deploy/cloud/operator/internal/dynamo/headless_service.go
Adds ReconcileModelServicesForComponents, GenerateHeadlessServiceForModel, and AddBaseModelLabel helpers for creating and syncing headless services indexed by base model name.
Worker Pool Utility
deploy/cloud/operator/internal/workerpool/pool.go
Introduces generic Execute function with parameterized Task and Result types for bounded-concurrency task execution with timeout enforcement and error aggregation.
Configuration & Setup
deploy/cloud/operator/PROJECT, deploy/cloud/operator/cmd/main.go, deploy/cloud/operator/config/rbac/role.yaml, deploy/cloud/helm/platform/components/operator/templates/manager-rbac.yaml, deploy/cloud/operator/config/crd/kustomization.yaml, deploy/cloud/operator/config/samples/kustomization.yaml, deploy/cloud/operator/internal/consts/consts.go
Registers DynamoModel resource in PROJECT file; wires DynamoModelReconciler with EndpointClient in main; adds RBAC rules for dynamomodels and endpointslices; includes CRD and sample in kustomization manifests; adds KubeLabelDynamoBaseModel constant.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

  • dynamo_model_controller.go: Complex reconciliation logic with finalizers, EndpointSlice discovery, parallel LoRA operations, condition management, and event emission; requires careful review of lifecycle handling and error paths.
  • modelendpoint/client.go: Concurrent LoRA loading/unloading with bounded worker pools, timeout enforcement, and aggregated error handling; verify timeout semantics and edge cases.
  • modelendpoint/discovery.go: Endpoint extraction and model discovery via field indexing; ensure query logic and request mapping are correct.
  • zz_generated.deepcopy.go: Auto-generated DeepCopy methods; verify consistency with new API types and import aliasing changes.
  • Existing controller modifications: Verify integration points in DynamoComponentDeployment and DynamoGraphDeployment controllers align with new reconciliation paths and error handling.
  • workerpool/pool.go: Generic concurrency pattern; verify goroutine lifecycle, result ordering, and timeout propagation.

Poem

🐰 A new model emerges, endpoints take flight,
LoRA loads swiftly through worker pool might,
Services headless guide discovery's way,
Dynamo models shall dance and play!
With conditions and status, the system's delight! 🎉

Pre-merge checks

✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: add dynamoModel CRD' clearly and concisely summarizes the main change, which is introducing a new DynamoModel CustomResourceDefinition across the codebase.
Docstring Coverage ✅ Passed Docstring coverage is 81.82% which is sufficient. The required threshold is 80.00%.
Description check ✅ Passed The PR description provides a clear overview of the feature (adding DynamoModel CRD) and includes practical examples showing how the new CR integrates with existing DynamoGraphDeployment configurations.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (5)
deploy/cloud/operator/config/crd/bases/nvidia.com_dynamocomponentdeployments.yaml (1)

10005-10018: Tighten modelRef schema (non-empty name, disallow unknown keys).

Looks good overall. To prevent empty strings and catch typos, add minimal validations.

Apply within this block:

               modelRef:
                 description: |-
                   ModelRef references a model that this component serves
                   When specified, a headless service will be created for endpoint discovery
-                properties:
+                additionalProperties: false
+                properties:
                   name:
                     description: Name is the base model identifier (e.g., "llama-3-70b-instruct-v1")
                     type: string
+                    minLength: 1
                   revision:
                     description: Revision is the model revision/version (optional)
                     type: string
+                    minLength: 1
                 required:
                   - name
                 type: object

Optional (nice UX): add an additionalPrinterColumn to surface the model at kubectl get time:

@@
     - additionalPrinterColumns:
       - description: Dynamo component
         jsonPath: .spec.dynamoComponent
         name: DynamoComponent
         type: string
+      - description: Model
+        jsonPath: .spec.modelRef.name
+        name: Model
+        type: string

Please confirm:

  • types.go defines ModelRef with json:"modelRef,omitempty" and string fields, and controller tolerates missing/empty revision.
  • No model names require characters beyond basic DNS-1123 label charset; if they do, keep minLength but skip adding a strict pattern. Based on learnings.
deploy/cloud/operator/internal/modelendpoint/lora.go (1)

66-66: Standardize logging levels for success cases.

Success logging is inconsistent: loadLoRA uses Info level (line 66) while unloadLoRA uses V(1) level (line 96). For consistency and operational visibility, both should log at the same level.

Consider standardizing to Info level for both operations:

-	logs.V(1).Info("Successfully unloaded LoRA", "address", address, "modelName", modelName)
+	logs.Info("Successfully unloaded LoRA", "address", address, "modelName", modelName)

Also applies to: 96-96

deploy/cloud/operator/config/crd/bases/nvidia.com_dynamomodels.yaml (1)

85-87: Consider adding validation to enforce loraPath usage.

The loraPath field is described as "only applicable for lora model type" but there's no schema-level validation to enforce this constraint. Users could accidentally set loraPath on base or adapter models.

Add CEL validation to ensure loraPath is only set when modelType is lora:

                 loraPath:
                   description: LoraPath is the path to the LoRA adapter (only applicable for lora model type)
                   type: string
                 modelName:
                   description: ModelName is the full model identifier (e.g., "meta-llama/Llama-3.3-70B-Instruct-lora")
                   type: string
                 modelType:
                   default: base
                   description: ModelType specifies the type of model (e.g., "base", "lora", "adapter")
                   enum:
                     - base
                     - lora
                     - adapter
                   type: string
               required:
                 - baseModelName
                 - modelName
               type: object
+              x-kubernetes-validations:
+                - rule: "self.modelType != 'lora' || has(self.loraPath)"
+                  message: "loraPath is required when modelType is 'lora'"
+                - rule: "self.modelType == 'lora' || !has(self.loraPath)"
+                  message: "loraPath should only be set when modelType is 'lora'"

Also applies to: 91-98

deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml (2)

10005-10018: Validate and sanitize modelRef for Service/labels; document empty revision semantics

Good addition. Please confirm:

  • Reconcile sanitizes modelRef.name (and revision if used) into valid DNS-1123 Service names (lowercase, [a-z0-9-], <=63), with truncation+hash to avoid collisions when names exceed 63 or contain dots/uppercases.
  • If modelRef is used in labels/selector values, ensure label constraints (<=63; allowed charset) or apply normalization similarly.
  • Clarify behavior when revision is empty (e.g., treated as “latest”, or excluded from identity). Add this to the Go type docstring so controller-gen propagates it.

Optionally, enforce constraints at the API by adding kubebuilder validation on the Go types (e.g., Patterns and MaxLength for name/revision) instead of manual YAML edits.

Based on learnings.


10005-10018: Improve kubectl UX with printer columns

Consider adding print columns on the Go type for:

  • Model (.spec.modelRef.name)
  • Revision (.spec.modelRef.revision)

Use +kubebuilder:printcolumn annotations so controller-gen emits them here (don’t hand-edit this YAML). This makes kubectl get dcd more informative.

Based on learnings.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3c0763f and 885d792.

📒 Files selected for processing (26)
  • deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml (1 hunks)
  • deploy/cloud/helm/crds/templates/nvidia.com_dynamographdeployments.yaml (1 hunks)
  • deploy/cloud/helm/crds/templates/nvidia.com_dynamomodels.yaml (1 hunks)
  • deploy/cloud/helm/platform/components/operator/templates/manager-rbac.yaml (4 hunks)
  • deploy/cloud/operator/PROJECT (1 hunks)
  • deploy/cloud/operator/api/v1alpha1/dynamo_model_types.go (1 hunks)
  • deploy/cloud/operator/api/v1alpha1/dynamocomponentdeployment_types.go (2 hunks)
  • deploy/cloud/operator/api/v1alpha1/zz_generated.deepcopy.go (11 hunks)
  • deploy/cloud/operator/cmd/main.go (2 hunks)
  • deploy/cloud/operator/config/crd/bases/nvidia.com_dynamocomponentdeployments.yaml (1 hunks)
  • deploy/cloud/operator/config/crd/bases/nvidia.com_dynamographdeployments.yaml (1 hunks)
  • deploy/cloud/operator/config/crd/bases/nvidia.com_dynamomodels.yaml (1 hunks)
  • deploy/cloud/operator/config/crd/kustomization.yaml (1 hunks)
  • deploy/cloud/operator/config/rbac/role.yaml (4 hunks)
  • deploy/cloud/operator/config/samples/kustomization.yaml (1 hunks)
  • deploy/cloud/operator/internal/consts/consts.go (1 hunks)
  • deploy/cloud/operator/internal/controller/dynamo_model_controller.go (1 hunks)
  • deploy/cloud/operator/internal/controller/dynamocomponentdeployment_controller.go (2 hunks)
  • deploy/cloud/operator/internal/controller/dynamographdeployment_controller.go (1 hunks)
  • deploy/cloud/operator/internal/dynamo/graph.go (1 hunks)
  • deploy/cloud/operator/internal/dynamo/headless_service.go (1 hunks)
  • deploy/cloud/operator/internal/modelendpoint/client.go (1 hunks)
  • deploy/cloud/operator/internal/modelendpoint/discovery.go (1 hunks)
  • deploy/cloud/operator/internal/modelendpoint/lora.go (1 hunks)
  • deploy/cloud/operator/internal/modelendpoint/types.go (1 hunks)
  • deploy/cloud/operator/internal/workerpool/pool.go (1 hunks)
🧰 Additional context used
🧠 Learnings (6)
📓 Common learnings
Learnt from: julienmancuso
Repo: ai-dynamo/dynamo PR: 1474
File: deploy/cloud/operator/internal/controller/dynamocomponent_controller.go:1308-1312
Timestamp: 2025-06-11T21:29:28.650Z
Learning: User julienmancuso expects replies in English; avoid switching languages unless explicitly requested.
📚 Learning: 2025-07-18T16:05:05.534Z
Learnt from: julienmancuso
Repo: ai-dynamo/dynamo PR: 2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml:1178-1180
Timestamp: 2025-07-18T16:05:05.534Z
Learning: The stopSignal field under lifecycle in DynamoComponentDeployment CRDs is autogenerated due to Kubernetes library upgrades (k8s.io/api and k8s.io/apimachinery from v0.32.3 to v0.33.1), not a manual design decision by the user.

Applied to files:

  • deploy/cloud/operator/PROJECT
  • deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml
  • deploy/cloud/operator/internal/consts/consts.go
  • deploy/cloud/operator/config/crd/bases/nvidia.com_dynamocomponentdeployments.yaml
  • deploy/cloud/operator/config/crd/bases/nvidia.com_dynamographdeployments.yaml
  • deploy/cloud/helm/crds/templates/nvidia.com_dynamographdeployments.yaml
📚 Learning: 2025-07-18T16:04:31.771Z
Learnt from: julienmancuso
Repo: ai-dynamo/dynamo PR: 2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml:92-98
Timestamp: 2025-07-18T16:04:31.771Z
Learning: CRD schemas in files like deploy/cloud/helm/crds/templates/*.yaml are auto-generated from Kubernetes library upgrades and should not be manually modified as changes would be overwritten during regeneration.

Applied to files:

  • deploy/cloud/operator/PROJECT
  • deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml
  • deploy/cloud/helm/crds/templates/nvidia.com_dynamomodels.yaml
  • deploy/cloud/operator/config/crd/kustomization.yaml
  • deploy/cloud/helm/crds/templates/nvidia.com_dynamographdeployments.yaml
📚 Learning: 2025-09-04T19:03:06.643Z
Learnt from: biswapanda
Repo: ai-dynamo/dynamo PR: 2872
File: examples/multimodal/deploy/agg_qwen.yaml:53-60
Timestamp: 2025-09-04T19:03:06.643Z
Learning: In the dynamo repository, Kubernetes Custom Resources use `gpu: "1"` format for GPU resource limits and requests, not the standard Kubernetes `nvidia.com/gpu: 1` format. This applies to DynamoGraphDeployment resources and other dynamo CRs.

Applied to files:

  • deploy/cloud/operator/PROJECT
  • deploy/cloud/helm/crds/templates/nvidia.com_dynamomodels.yaml
  • deploy/cloud/operator/internal/consts/consts.go
  • deploy/cloud/operator/config/crd/bases/nvidia.com_dynamomodels.yaml
  • deploy/cloud/operator/config/samples/kustomization.yaml
  • deploy/cloud/operator/config/crd/kustomization.yaml
  • deploy/cloud/operator/config/crd/bases/nvidia.com_dynamographdeployments.yaml
  • deploy/cloud/helm/crds/templates/nvidia.com_dynamographdeployments.yaml
📚 Learning: 2025-07-18T16:04:47.465Z
Learnt from: julienmancuso
Repo: ai-dynamo/dynamo PR: 2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamographdeployments.yaml:1233-1235
Timestamp: 2025-07-18T16:04:47.465Z
Learning: The `stopSignal` field in Kubernetes CRDs like DynamoGraphDeployment and DynamoComponentDeployment is autogenerated by controller-gen when upgrading Kubernetes library versions, and represents expected upstream API changes rather than manual code that needs custom validation.

Applied to files:

  • deploy/cloud/operator/PROJECT
  • deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml
  • deploy/cloud/operator/config/crd/bases/nvidia.com_dynamocomponentdeployments.yaml
  • deploy/cloud/operator/config/crd/bases/nvidia.com_dynamographdeployments.yaml
  • deploy/cloud/helm/crds/templates/nvidia.com_dynamographdeployments.yaml
📚 Learning: 2025-10-24T04:21:08.751Z
Learnt from: biswapanda
Repo: ai-dynamo/dynamo PR: 3858
File: recipes/deepseek-r1/model-cache/model-download.yaml:18-32
Timestamp: 2025-10-24T04:21:08.751Z
Learning: In the recipes directory structure, model-specific recipes (e.g., recipes/deepseek-r1/, recipes/llama-3-70b/) contain hardcoded model names and revisions in their Kubernetes manifests (like model-download.yaml). Each recipe directory is deployment-specific and self-contained, so hardcoding model-specific values is the intended design pattern.

Applied to files:

  • deploy/cloud/operator/config/crd/kustomization.yaml
🧬 Code graph analysis (12)
deploy/cloud/operator/internal/modelendpoint/discovery.go (2)
deploy/cloud/operator/internal/modelendpoint/types.go (1)
  • Candidate (21-24)
deploy/cloud/operator/api/v1alpha1/dynamo_model_types.go (1)
  • DynamoModelList (110-114)
deploy/cloud/operator/cmd/main.go (2)
deploy/cloud/operator/internal/controller/dynamo_model_controller.go (1)
  • DynamoModelReconciler (63-67)
deploy/cloud/operator/internal/modelendpoint/client.go (2)
  • Client (43-45)
  • NewClient (48-54)
deploy/cloud/operator/internal/controller/dynamographdeployment_controller.go (1)
deploy/cloud/operator/internal/dynamo/headless_service.go (1)
  • ReconcileModelServicesForComponents (37-97)
deploy/cloud/operator/internal/workerpool/pool.go (1)
deploy/cloud/operator/api/dynamo/schemas/schemas.go (1)
  • Duration (38-38)
deploy/cloud/operator/api/v1alpha1/dynamo_model_types.go (1)
deploy/cloud/operator/api/v1alpha1/groupversion_info.go (1)
  • SchemeBuilder (35-35)
deploy/cloud/operator/internal/controller/dynamocomponentdeployment_controller.go (1)
deploy/cloud/operator/internal/dynamo/headless_service.go (2)
  • ReconcileModelServicesForComponents (37-97)
  • AddBaseModelLabel (143-147)
deploy/cloud/operator/api/v1alpha1/zz_generated.deepcopy.go (2)
deploy/cloud/operator/api/v1alpha1/dynamocomponentdeployment_types.go (1)
  • ModelReference (279-287)
deploy/cloud/operator/api/v1alpha1/dynamo_model_types.go (6)
  • DynamoModel (99-105)
  • DynamoModelList (110-114)
  • DynamoModelSpec (25-44)
  • ModelSource (47-54)
  • DynamoModelStatus (72-86)
  • EndpointInfo (57-69)
deploy/cloud/operator/internal/controller/dynamo_model_controller.go (5)
deploy/cloud/operator/internal/modelendpoint/client.go (2)
  • Client (43-45)
  • NewClient (48-54)
deploy/cloud/operator/api/v1alpha1/dynamo_model_types.go (2)
  • DynamoModel (99-105)
  • EndpointInfo (57-69)
deploy/cloud/operator/internal/modelendpoint/discovery.go (2)
  • FindModelsForBaseModel (77-112)
  • ExtractCandidates (35-73)
deploy/cloud/operator/internal/modelendpoint/types.go (1)
  • Candidate (21-24)
deploy/cloud/operator/internal/consts/consts.go (1)
  • DynamoSystemPort (22-22)
deploy/cloud/operator/internal/dynamo/headless_service.go (3)
deploy/cloud/operator/internal/controller_common/resource.go (2)
  • Reconciler (49-52)
  • SyncResource (60-195)
deploy/cloud/operator/api/v1alpha1/dynamocomponentdeployment_types.go (2)
  • DynamoComponentDeploymentSharedSpec (48-111)
  • ModelReference (279-287)
deploy/cloud/operator/internal/consts/consts.go (3)
  • KubeLabelDynamoBaseModel (41-41)
  • DynamoSystemPortName (23-23)
  • DynamoSystemPort (22-22)
deploy/cloud/operator/internal/dynamo/graph.go (1)
deploy/cloud/operator/internal/dynamo/headless_service.go (1)
  • AddBaseModelLabel (143-147)
deploy/cloud/operator/internal/modelendpoint/client.go (3)
deploy/cloud/operator/internal/modelendpoint/types.go (1)
  • Candidate (21-24)
deploy/cloud/operator/api/v1alpha1/dynamo_model_types.go (2)
  • DynamoModel (99-105)
  • EndpointInfo (57-69)
deploy/cloud/operator/internal/workerpool/pool.go (2)
  • Task (28-31)
  • Execute (43-102)
deploy/cloud/operator/internal/modelendpoint/lora.go (1)
deploy/cloud/operator/internal/modelendpoint/client.go (1)
  • Client (43-45)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: sglang (amd64)
  • GitHub Check: trtllm (arm64)
  • GitHub Check: trtllm (amd64)
  • GitHub Check: vllm (amd64)
  • GitHub Check: operator (amd64)
  • GitHub Check: Build and Test - dynamo
🔇 Additional comments (17)
deploy/cloud/operator/PROJECT (1)

27-34: LGTM!

The DynamoModel resource configuration follows the same structure and conventions as the existing DynamoComponentDeployment and DynamoGraphDeployment resources.

deploy/cloud/operator/internal/controller/dynamographdeployment_controller.go (1)

344-354: LGTM!

The model service reconciliation is appropriately placed after Grove scaling, ensuring that workload resources are created before setting up endpoint discovery services. Error handling follows the established pattern in this controller.

deploy/cloud/operator/internal/consts/consts.go (1)

41-41: LGTM!

The new constant follows the established naming convention and is appropriately positioned with other Kubernetes label constants.

deploy/cloud/operator/api/v1alpha1/dynamocomponentdeployment_types.go (2)

86-89: LGTM!

The optional ModelRef field is well-documented and designed for backward compatibility. The documentation clearly explains its purpose for endpoint discovery via headless services.


278-287: LGTM!

The ModelReference type is well-designed with appropriate validation markers. The required Name field and optional Revision field provide flexibility while ensuring essential information is present.

deploy/cloud/operator/internal/controller/dynamocomponentdeployment_controller.go (2)

330-343: LGTM!

The model service reconciliation is correctly implemented for component-level reconciliation. The componentMap contains only the current component, which is appropriate for this controller's scope.


943-955: Improved label handling.

The function now properly initializes and populates labels instead of returning an empty map. This ensures that:

  1. Existing component labels are preserved
  2. Base model labels are added when a ModelRef is specified

This is a positive change that enables proper label propagation throughout the resource hierarchy.

deploy/cloud/operator/cmd/main.go (2)

63-63: LGTM!

The modelendpoint import is correctly added and used for creating the EndpointClient.


564-571: LGTM!

The DynamoModelReconciler setup follows the established pattern for controller initialization. The EndpointClient is appropriately created once and injected into the reconciler.

deploy/cloud/operator/config/crd/bases/nvidia.com_dynamographdeployments.yaml (1)

10139-10152: Go types are properly defined and aligned with the CRD schema.

The verification confirms that the ModelReference struct is correctly defined in deploy/cloud/operator/api/v1alpha1/dynamocomponentdeployment_types.go (lines 278–287) with proper kubebuilder annotations (+kubebuilder:validation:Required for name, +optional for revision). The modelRef field in DynamoComponentDeploymentSharedSpec is correctly typed as *ModelReference with the +optional tag and proper JSON marshaling hints. The autogenerated CRD schema accurately reflects these Go types, and the structure aligns with how modelRef is used in the controller code (e.g., AddBaseModelLabel function). No issues found.

deploy/cloud/helm/platform/components/operator/templates/manager-rbac.yaml (1)

65-72: LGTM: RBAC permissions properly scoped for DynamoModel CRD.

The added permissions for EndpointSlices discovery and DynamoModel lifecycle management (including finalizers and status updates) are appropriate and follow standard Kubernetes controller patterns.

Also applies to: 372-372, 387-387, 396-396

deploy/cloud/operator/config/rbac/role.yaml (1)

89-96: LGTM: RBAC permissions consistent with Helm template.

The RBAC additions mirror those in the Helm template and are properly scoped for the DynamoModel controller's operational needs.

Also applies to: 173-173, 188-188, 197-197

deploy/cloud/operator/config/crd/bases/nvidia.com_dynamomodels.yaml (2)

166-182: Verify whether podName should be required in EndpointInfo.

The podName field is not in the required list (lines 179-181), suggesting it may be optional. However, in a Kubernetes environment with endpoint discovery via EndpointSlices, the pod name should typically always be known and valuable for debugging and observability.

Please confirm whether podName can legitimately be absent in any scenario. If not, consider adding it to the required fields:

                     required:
                       - address
+                      - podName
                       - ready

174-178: Clarify the design intent for base model endpoint tracking.

The comment states "For base models: always false (no probing performed)," which suggests base model endpoints are tracked but never marked ready. This raises questions about the utility of endpoint tracking for base models and whether the status structure optimally serves both base and LoRA model use cases.

Please clarify the design rationale:

  • Why track endpoints for base models if ready is always false?
  • Is there a future plan to probe base model readiness?
  • Would separate status structures for base vs LoRA models improve clarity?
deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml (1)

10005-10018: RBAC check for headless Service creation

Since modelRef triggers headless service creation for endpoint discovery, verify the PR includes RBAC for Services and EndpointSlices (get/list/watch/create/update/patch) in the operator’s ClusterRole.

Based on learnings.

deploy/cloud/helm/crds/templates/nvidia.com_dynamographdeployments.yaml (1)

10139-10152: Verify that base CRD file has been updated to generate this template change.

Per prior learnings on this codebase, CRD schemas in deploy/cloud/helm/crds/templates/*.yaml are auto-generated from base CRD files in deploy/cloud/operator/config/crd/bases/ and should not be manually edited, as manual changes would be overwritten during regeneration.

The modelRef field definition itself appears structurally sound as OpenAPI v3 schema. However, ensure that the corresponding base CRD file (deploy/cloud/operator/config/crd/bases/nvidia.com_dynamographdeployments.yaml) has been updated with this field, and that this template change was auto-generated from it rather than manually added.

deploy/cloud/operator/internal/dynamo/headless_service.go (1)

91-105: Review comment is incorrect for Go 1.24.0

The repository declares go 1.24.0 in deploy/cloud/operator/go.mod, which is well after Go 1.22. Starting with Go 1.22, loop variables are scoped per iteration rather than reused across iterations, so closures over loop variables are safe. The problematic code pattern the review describes is not an issue in this codebase.

Additionally, the review references the wrong file. The actual loops capturing candidate are in deploy/cloud/operator/internal/modelendpoint/client.go (LoadLoRA at lines 91–102, UnloadLoRA at lines 152–163), not in headless_service.go.

The code requires no changes.

Likely an incorrect or invalid review comment.

Signed-off-by: Julien Mancuso <[email protected]>
Signed-off-by: Julien Mancuso <[email protected]>
Signed-off-by: Julien Mancuso <[email protected]>
Signed-off-by: Julien Mancuso <[email protected]>
Signed-off-by: Julien Mancuso <[email protected]>
logs.Info("Finalizing DynamoModel", "modelType", model.Spec.ModelType)

// Only perform cleanup for LoRA models
if model.Spec.ModelType == "lora" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bring the left side to the lower case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in 2b9d6c2

logs.Info("Unloading LoRA from endpoints", "endpointCount", len(candidates))

// Initialize endpoint client if needed
if r.EndpointClient == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check happens in the Reconcile and FinalizeResource(). Could a race condition happen?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in 2b9d6c2

candidates, serviceNames, err := r.getEndpointCandidates(ctx, model)
if err != nil {
// Error already logged and status updated in helper
return ctrl.Result{RequeueAfter: 30 * time.Second}, err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like 30 is used more than once. Make it a constant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in 2b9d6c2

ctx context.Context,
reconciler commonController.Reconciler,
owner client.Object,
services map[string]*v1alpha1.DynamoComponentDeploymentSharedSpec,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

neat: these are components not services - rename?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in 2b9d6c2

// Uses a hash of the model name to avoid label length/character restrictions
func AddBaseModelLabel(labels map[string]string, modelRef *v1alpha1.ModelReference) {
if modelRef != nil && modelRef.Name != "" {
labels[commonconsts.KubeLabelDynamoBaseModelHash] = HashModelName(modelRef.Name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could labels be nill?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in 2b9d6c2

"sigs.k8s.io/controller-runtime/pkg/log"
)

// ReconcileModelServicesForComponents creates headless services for components with modelRef
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the name "headless_service" exposes implementation detail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in 2b9d6c2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants