Releases: aws/sagemaker-hyperpod-cli
Releases · aws/sagemaker-hyperpod-cli
Elastic Training Support for ReInvent Keynote 3
- Adding new command line arguments to the HyperPodTrainingOperator to support elastic training capabailities
- --elastic-replica-increment-step, --max-node-count, --elastic-graceful-shutdown-timeout-in-seconds, --elastic-scaling-timeout-in-seconds, --elastic-scale-up-snooze-time-in-seconds, --elastic-replica-discrete-values
- Enables dynamic scaling of compute resources during training operations
Hyperpod CLI V2 with Nova recipe support
Hyperpod CLI V2 with Nova recipe support
Parker CLI, Fractional GPU Feature
-Added hp-devspace command set for ML dev environments
-New commands: create, list, get, update, delete, geturl for dev space management
Support for namespace-based auth and resource isolation
Added auth and get-config commands to check permissions and view default settings
Users can request partial GPU resources using MIG profiles instead of full GPUs
Added --accelerator-partition-type, --accelerator-partition-count, accelerator-partition-limit
New list-accelerator-partition-type command to view available GPU partitions for instance types
v3.3.1
Features
- Describe cluster command
- User can use hyp describe cluster to learn more info about hp clusters
- Jinja template handling logic for inference and training
- User can modify jinja template to add parameters supported by CRD through init experience of inference and training, for further CLI customization
- Cluster creation template versioning
- User can choose cloudformation template version through cluster creation expeirence
- KVCache and intelligent routing for HyperPod Inference
- InferenceEndpointConfig CRD supported is updated to v1
- KVCache and Intelligent Routing support is added in template version 1.1
init experience Launch
Features
- Init Experience
- Init, Validate, and Create JumpStart endpoint, Custom endpoint, and PyTorch Training Job with local configuration
- Cluster management
- Bug fixes for cluster creation
Bug fixes
Bug Fixes
- Bug Fixes in cluster creation
v3.2.0 - Cluster Management and Init Experience
Features
Cluster management
Creation of cluster stack
Describing and listing a cluster stack
Updating a cluster
Init Experience
Init, Validate, Create with local configurations
v3.1.0: Release tg (#209)
v3.1.0 (2025-08-13)
Features
- Task Governance feature for training jobs.
v.3.0.2
Features
- Update volume flag to support hostPath and PVC
- Add an option to disable the deployment of KubeFlow TrainingOperator
- Enable telemetry for CLI