-
Notifications
You must be signed in to change notification settings - Fork 488
Open
Labels
kind/featurenew functionnew function
Description
What would you like to be added:
hami-cli
What type of PR is this?
/kind feature
HAMi CLI Command Line Tool
Background
Currently, HAMi provides GPU information through WebUI and metrics. We propose adding a CLI approach. Compared to WebUI/metrics, CLI is more lightweight and AI Agent friendly.
Goals
- Provide
hami-cli node listcommand to display an overview of all GPU nodes - Provide
hami-cli node detail <node-name>command to display node details and running GPU Pods - Provide
hami-cli device listcommand to display an overview of all GPU devices - Provide
hami-cli device detail <uuid>command to display device details and running GPU Pods - Provide
hami-cli quota listcommand to display all GPU resource quotas - Provide
hami-cli quota detail <namespace>/<name>command to display quota details
CLI Command Usage
# Global Options
hami-cli [command] [flags]
Flags:
--scheduler-url string HAMi scheduler API URL (default "http://localhost:9090")
--output string Output format: table|json|yaml (default "table")
--kubeconfig string Path to kubeconfig file
-h, --help Help for hami-cli
Commands:
node Manage GPU nodes
device Manage GPU devices
quota Manage GPU quotas
version Print version informationOutput Format Examples
hami-cli node list
NAME VENDOR MODE SPLIT DEVICES CORE(Used/Total) MEMORY(Used/Total)
node-1 NVIDIA hami-core 1:4 8 200/800 (25%) 32Gi/128Gi (25%)
node-2 NVIDIA mig - 4 100/400 (25%) 16Gi/64Gi (25%)
node-3 Iluvatar hami-core 1:2 2 50/200 (25%) 8Gi/32Gi (25%)
hami-cli node detail node-1
Name: node-1
Vendor: NVIDIA
Mode: hami-core
Split Ratio: 1:4
Device Count: 8
Core: 200/800 (25%)
Memory: 32Gi/128Gi (25%)
Devices:
UUID INDEX TYPE CORE(Used/Total) MEMORY(Used/Total)
GPU-abc123-def456-789... 0 Tesla V100 25/100 (25%) 4Gi/16Gi (25%)
GPU-abc123-def456-790... 1 Tesla V100 25/100 (25%) 4Gi/16Gi (25%)
GPU-abc123-def456-791... 2 Tesla V100 25/100 (25%) 4Gi/16Gi (25%)
...
GPU Pods:
NAMESPACE NAME GPU CORE MEMORY AGE
default training-pod-1 1 25 4Gi 2h30m
default inference-pod-1 1 25 4Gi 1h15m
ml-team research-pod-1 2 50 8Gi 45m
hami-cli device list
UUID NODE VENDOR TYPE CORE(Used/Total) MEMORY(Used/Total)
GPU-abc123-def456-789... node-1 NVIDIA Tesla V100 25/100 (25%) 4Gi/16Gi (25%)
GPU-abc123-def456-790... node-1 NVIDIA Tesla V100 25/100 (25%) 4Gi/16Gi (25%)
GPU-xyz789-abc123-456... node-2 NVIDIA Tesla A100 50/100 (50%) 20Gi/40Gi (50%)
hami-cli device detail GPU-abc123-def456-789
UUID: GPU-abc123-def456-789012345678
Node: node-1
Vendor: NVIDIA
Type: Tesla V100
Index: 0
Mode: hami-core
Health: Healthy
Core: 25/100 (25%)
Memory: 4Gi/16Gi (25%)
GPU Pods:
NAMESPACE NAME GPU CORE MEMORY AGE
default training-pod-1 1 25 4Gi 2h30m
hami-cli quota list
NAMESPACE NAME GPU(Used/Total) CORE(Used/Total) MEMORY(Used/Total)
team-a gpu-quota 4/10 (40%) 100/400 (25%) 16Gi/64Gi (25%)
team-b gpu-quota 2/5 (40%) 50/200 (25%) 8Gi/32Gi (25%)
hami-cli quota detail team-a/gpu-quota
Namespace: team-a
Name: gpu-quota
GPU: 4/10 (40%)
Core: 100/400 (25%)
Memory: 16Gi/64Gi (25%)
GPU Pods:
NAMESPACE NAME GPU CORE MEMORY AGE
team-a training-pod-1 2 50 8Gi 2h30m
team-a inference-pod-1 2 50 8Gi 1h15m
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
kind/featurenew functionnew function