Skip to content

Releases: aws/sagemaker-hyperpod-cli

Elastic Training Support for ReInvent Keynote 3

03 Dec 18:07
c64811d

Choose a tag to compare

  • Adding new command line arguments to the HyperPodTrainingOperator to support elastic training capabailities
    • --elastic-replica-increment-step, --max-node-count, --elastic-graceful-shutdown-timeout-in-seconds, --elastic-scaling-timeout-in-seconds, --elastic-scale-up-snooze-time-in-seconds, --elastic-replica-discrete-values
  • Enables dynamic scaling of compute resources during training operations

Hyperpod CLI V2 with Nova recipe support

02 Dec 12:36
9774c58

Choose a tag to compare

Hyperpod CLI V2 with Nova recipe support

Parker CLI, Fractional GPU Feature

21 Nov 09:11
0eba08f

Choose a tag to compare

-Added hp-devspace command set for ML dev environments

-New commands: create, list, get, update, delete, geturl for dev space management
Support for namespace-based auth and resource isolation
Added auth and get-config commands to check permissions and view default settings

Users can request partial GPU resources using MIG profiles instead of full GPUs

Added --accelerator-partition-type, --accelerator-partition-count, accelerator-partition-limit
New list-accelerator-partition-type command to view available GPU partitions for instance types

v3.3.1

30 Oct 20:47
7233490

Choose a tag to compare

Features

  • Describe cluster command
    • User can use hyp describe cluster to learn more info about hp clusters
  • Jinja template handling logic for inference and training
    • User can modify jinja template to add parameters supported by CRD through init experience of inference and training, for further CLI customization
  • Cluster creation template versioning
    • User can choose cloudformation template version through cluster creation expeirence
  • KVCache and intelligent routing for HyperPod Inference
    • InferenceEndpointConfig CRD supported is updated to v1
    • KVCache and Intelligent Routing support is added in template version 1.1

init experience Launch

24 Sep 19:51

Choose a tag to compare

Features

  • Init Experience
    • Init, Validate, and Create JumpStart endpoint, Custom endpoint, and PyTorch Training Job with local configuration
  • Cluster management
    • Bug fixes for cluster creation

Bug fixes

10 Sep 19:23
162fb79

Choose a tag to compare

Features

  • Fix for production canary failures caused by bad training job template.
  • New version for Health Monitoring Agent (1.0.790.0_1.0.266.0) with minor improvements and bug fixes.

Bug Fixes

28 Aug 00:14
5a346e8

Choose a tag to compare

  • Bug Fixes in cluster creation

v3.2.0 - Cluster Management and Init Experience

26 Aug 22:56
12730ca

Choose a tag to compare

Features

Cluster management

    Creation of cluster stack
    Describing and listing a cluster stack
    Updating a cluster

Init Experience

    Init, Validate, Create with local configurations

v3.1.0: Release tg (#209)

14 Aug 20:35
0fd2bef

Choose a tag to compare

v3.1.0 (2025-08-13)

Features

  • Task Governance feature for training jobs.

v.3.0.2

01 Aug 18:16
36fac66

Choose a tag to compare

Features

  • Update volume flag to support hostPath and PVC
  • Add an option to disable the deployment of KubeFlow TrainingOperator
  • Enable telemetry for CLI