Skip to content

Device allocation issue for multi-container pods with init containers #1667

@haitwang-cloud

Description

@haitwang-cloud

What happened:

Based on PR #1650, device allocation did not properly support multi-container pods that include init containers. This PR attempts to fix device allocation behavior for such cases.

What you expected to happen:

Multi-container pods, including those with init containers, should be able to allocate devices correctly through HAMi, without misassignment or failures.

How to reproduce it (as minimally and precisely as possible):

  1. Create a pod spec with multiple containers, including at least one init container that requires device allocation.
  2. Deploy to a HAMi-enabled cluster.
  3. Observe device allocation results and logs.

Anything else we need to know?:

See #1650 for related fixes and context.

  • The output of nvidia-smi -a on your host
  • Your docker or containerd configuration file (e.g: /etc/docker/daemon.json)
  • The hami-device-plugin container logs
  • The hami-scheduler container logs
  • The kubelet logs on the node (e.g: sudo journalctl -r -u kubelet)

Metadata

Metadata

Labels

kind/bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions