Skip to content

Conversation

@zhongdaor-nv
Copy link
Contributor

@zhongdaor-nv zhongdaor-nv commented Nov 6, 2025

Overview:

Extend add_tensor_model so that ModelDeploymentCard can be correctly picked up

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

Summary by CodeRabbit

  • New Features

    • Added support for optional runtime configuration when deploying tensor models, enabling more granular control over deployment settings.
  • Bug Fixes

    • Improved model removal process to ensure complete cleanup of associated deployment metadata during tensor model deletion.

@zhongdaor-nv zhongdaor-nv marked this pull request as ready for review November 6, 2025 23:46
@zhongdaor-nv zhongdaor-nv requested a review from a team as a code owner November 6, 2025 23:46
@zhongdaor-nv zhongdaor-nv changed the title Zhongdaor/dis 982 support modelconfig with kserve grpc python bindings fix: Extend add_tensor_model so that ModelDeploymentCard can be correctly picked up Nov 6, 2025
@github-actions github-actions bot added the fix label Nov 6, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 6, 2025

Walkthrough

The KServe gRPC Rust bindings are extended to support optional runtime configuration for tensor models. The add_tensor_model method now accepts an optional ModelRuntimeConfig parameter to conditionally create and persist model deployment metadata via a ModelDeploymentCard. The remove_tensor_model method adds best-effort cleanup of associated model cards.

Changes

Cohort / File(s) Summary
Imports and types
lib/bindings/python/rust/kserve_grpc.rs
Introduces llm_rs alias and imports ModelDeploymentCard, ModelInput, ModelType for model deployment metadata, and ModelRuntimeConfig from llm::local_model.
API signature updates
lib/bindings/python/rust/kserve_grpc.rs
add_tensor_model extended to accept optional runtime_config: Option<ModelRuntimeConfig> parameter; pyo3 signature annotation updated to (model, checksum, engine, runtime_config=None) for Python binding.
Conditional persistence logic
lib/bindings/python/rust/kserve_grpc.rs
add_tensor_model now conditionally creates a ModelDeploymentCard with Tensor-based type when runtime_config is provided, assigns the configuration, and persists via model_manager; errors propagated to Python via to_pyerr.
Cleanup enhancement
lib/bindings/python/rust/kserve_grpc.rs
remove_tensor_model now attempts to remove associated model card (best-effort, silently ignores if absent) and returns Ok(()) after cleanup.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Verify conditional logic in add_tensor_model correctly creates and persists ModelDeploymentCard only when runtime_config is Some
  • Confirm error handling and propagation to Python layer via to_pyerr is appropriate for card persistence failures
  • Validate that remove_tensor_model cleanup gracefully handles missing model cards without affecting primary removal logic
  • Check that new imports and type usage (ModelDeploymentCard, ModelType, ModelInput) align with broader codebase patterns

Poem

🐰 A config now flows where tensors play,
Deployment cards filed away each day,
When models depart, we clean with care,
Optional settings floating through air! ✨

Pre-merge checks

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is largely incomplete with most template sections either empty or containing only placeholder text (e.g., '#xxx' for issue number, no file pointers or change details). Fill in the Details and Where should the reviewer start sections with specifics, complete the issue number reference, and describe the actual changes made.
Docstring Coverage ⚠️ Warning Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: extending add_tensor_model to support ModelDeploymentCard pickup, which is the core modification evident in the changeset.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

runtime_config: Option<ModelRuntimeConfig>,
) -> PyResult<()> {
// If runtime_config is provided, create and save a ModelDeploymentCard
// so the ModelConfig endpoint can return model configuration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add or append to an existing test so that we can verify the ModelConfig endpoint works when runtime_config is provided?

@rmccorm4 rmccorm4 added the frontend `python -m dynamo.frontend` and `dynamo-run in=http|text|grpc` label Nov 7, 2025
Comment on lines +112 to +114
// Also remove the model card if it exists
// (It's ok if it doesn't exist since runtime_config is optional, we just ignore the None return)
let _ = self.inner.model_manager().remove_model_card(&model);
Copy link
Contributor

@KrishnanPrash KrishnanPrash Nov 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In dynamo, when a non-tensor model is unloaded, the MDC persists in etcd. From my understanding, with these changes, tensor model MDCs are explicitly removed from etcd. Does this create inconsistent cleanup behavior between model types?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In dynamo, when a non-tensor model is unloaded, the MDC persists in etcd.

@grahamking changed this behavior in the past month or two, now MDCs are associated with model instances and should get cleaned up when the instance goes away.

Signed-off-by: zhongdaor <[email protected]>
Signed-off-by: zhongdaor <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

fix frontend `python -m dynamo.frontend` and `dynamo-run in=http|text|grpc` size/L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants