Need a smooth upgrade method for HAMI 2.3.13 to 2.5.0+ (avoid Helm uninstall/install to prevent service disruption)

### Description
I am currently running HAMI version 2.3.13 in a production Kubernetes cluster (v1.30.2), and we encounter a critical issue: the NVIDIA 5090 GPU cannot be recognized/used with this version. To resolve this compatibility issue, we plan to upgrade HAMI to a newer version (2.5.0+), as confirmed that versions 2.5.0 and above support the NVIDIA 5090 GPU.

However, the official upgrade documentation only provides the approach of `helm uninstall` followed by `helm install`, which will completely stop and re-deploy the entire HAMI service. This will cause **unacceptable downtime and disruption to our 7x24 online business** (we rely on HAMI for GPU-accelerated workloads in production, and any downtime directly impacts core business operations).

### Expected Solution
We are looking for a **smooth/rolling upgrade method** for HAMI (e.g., in-place update, DaemonSet rolling update, or Helm upgrade with zero downtime) that:
1. Avoids full uninstall/install of the Helm release (to prevent complete service outage)
2. Minimizes or eliminates downtime for online GPU-dependent services
3. Ensures compatibility with NVIDIA 5090 GPU after upgrading to 2.5.0+
4. Preserves existing configurations (e.g., GPU scheduling rules, node labels, runtime settings, resource quotas) as much as possible
5. Works stably on Kubernetes v1.30.2

### Environment
- HAMI Current Version: 2.3.13
- HAMI Target Version: 2.5.0+
- Kubernetes Version: v1.30.2
- GPU Model: NVIDIA 5090

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Need a smooth upgrade method for HAMI 2.3.13 to 2.5.0+ (avoid Helm uninstall/install to prevent service disruption) #1668

Description

Expected Solution

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Need a smooth upgrade method for HAMI 2.3.13 to 2.5.0+ (avoid Helm uninstall/install to prevent service disruption) #1668

Description

Description

Expected Solution

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions