Skip to content

Support hami-cli #1638

@ouyangluwei163

Description

@ouyangluwei163

What would you like to be added:
hami-cli

What type of PR is this?

/kind feature

HAMi CLI Command Line Tool

Background

Currently, HAMi provides GPU information through WebUI and metrics. We propose adding a CLI approach. Compared to WebUI/metrics, CLI is more lightweight and AI Agent friendly.

Goals

  • Provide hami-cli node list command to display an overview of all GPU nodes
  • Provide hami-cli node detail <node-name> command to display node details and running GPU Pods
  • Provide hami-cli device list command to display an overview of all GPU devices
  • Provide hami-cli device detail <uuid> command to display device details and running GPU Pods
  • Provide hami-cli quota list command to display all GPU resource quotas
  • Provide hami-cli quota detail <namespace>/<name> command to display quota details

CLI Command Usage

# Global Options
hami-cli [command] [flags]

Flags:
  --scheduler-url string   HAMi scheduler API URL (default "http://localhost:9090")
  --output string          Output format: table|json|yaml (default "table")
  --kubeconfig string      Path to kubeconfig file
  -h, --help               Help for hami-cli

Commands:
  node      Manage GPU nodes
  device    Manage GPU devices
  quota     Manage GPU quotas
  version   Print version information

Output Format Examples

hami-cli node list

NAME       VENDOR   MODE        SPLIT   DEVICES   CORE(Used/Total)   MEMORY(Used/Total)
node-1     NVIDIA   hami-core   1:4     8         200/800 (25%)      32Gi/128Gi (25%)
node-2     NVIDIA   mig         -       4         100/400 (25%)      16Gi/64Gi (25%)
node-3     Iluvatar hami-core   1:2     2         50/200 (25%)       8Gi/32Gi (25%)

hami-cli node detail node-1

Name:         node-1
Vendor:       NVIDIA
Mode:         hami-core
Split Ratio:  1:4
Device Count: 8
Core:         200/800 (25%)
Memory:       32Gi/128Gi (25%)

Devices:
  UUID                                   INDEX   TYPE          CORE(Used/Total)   MEMORY(Used/Total)
  GPU-abc123-def456-789...               0       Tesla V100    25/100 (25%)       4Gi/16Gi (25%)
  GPU-abc123-def456-790...               1       Tesla V100    25/100 (25%)       4Gi/16Gi (25%)
  GPU-abc123-def456-791...               2       Tesla V100    25/100 (25%)       4Gi/16Gi (25%)
  ...

GPU Pods:
  NAMESPACE   NAME              GPU   CORE   MEMORY   AGE
  default     training-pod-1    1     25     4Gi      2h30m
  default     inference-pod-1   1     25     4Gi      1h15m
  ml-team     research-pod-1    2     50     8Gi      45m

hami-cli device list

UUID                                   NODE      VENDOR   TYPE          CORE(Used/Total)   MEMORY(Used/Total)
GPU-abc123-def456-789...               node-1    NVIDIA   Tesla V100    25/100 (25%)       4Gi/16Gi (25%)
GPU-abc123-def456-790...               node-1    NVIDIA   Tesla V100    25/100 (25%)       4Gi/16Gi (25%)
GPU-xyz789-abc123-456...               node-2    NVIDIA   Tesla A100    50/100 (50%)       20Gi/40Gi (50%)

hami-cli device detail GPU-abc123-def456-789

UUID:         GPU-abc123-def456-789012345678
Node:         node-1
Vendor:       NVIDIA
Type:         Tesla V100
Index:        0
Mode:         hami-core
Health:       Healthy
Core:         25/100 (25%)
Memory:       4Gi/16Gi (25%)

GPU Pods:
  NAMESPACE   NAME              GPU   CORE   MEMORY   AGE
  default     training-pod-1    1     25     4Gi      2h30m

hami-cli quota list

NAMESPACE   NAME        GPU(Used/Total)   CORE(Used/Total)   MEMORY(Used/Total)
team-a      gpu-quota   4/10 (40%)        100/400 (25%)      16Gi/64Gi (25%)
team-b      gpu-quota   2/5 (40%)         50/200 (25%)       8Gi/32Gi (25%)

hami-cli quota detail team-a/gpu-quota

Namespace:    team-a
Name:         gpu-quota
GPU:          4/10 (40%)
Core:         100/400 (25%)
Memory:       16Gi/64Gi (25%)

GPU Pods:
  NAMESPACE   NAME              GPU   CORE   MEMORY   AGE
  team-a      training-pod-1    2     50     8Gi      2h30m
  team-a      inference-pod-1   2     50     8Gi      1h15m

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions