You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+15-3Lines changed: 15 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,6 +7,9 @@ This documentation serves as a reference for the available HyperPod CLI commands
7
7
8
8
## Table of Contents
9
9
-[Overview](#overview)
10
+
-[Prerequisites](#prerequisites)
11
+
-[Platform Support](#platform-support)
12
+
-[ML Framework Support](#ml-framework-support)
10
13
-[Installation](#installation)
11
14
-[Usage](#usage)
12
15
-[Listing Clusters](#listing-clusters)
@@ -30,6 +33,15 @@ The SageMaker HyperPod CLI is a tool that helps submit training jobs to the Amaz
30
33
- Or you can follow the [Readme under helm_chart folder](https://github.com/aws/sagemaker-hyperpod-cli/blob/main/helm_chart/readme.md) to install Kubeflow Training Operator.
31
34
- Configure [aws cli](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-started.html) to point to the correct region where your HyperPod clusters are located.
32
35
36
+
## Platform Support
37
+
38
+
SageMaker HyperPod CLI currently supports Linux and MacOS platforms. Windows platform is not supported now.
39
+
40
+
## ML Framework Support
41
+
42
+
SageMaker HyperPod CLI currently supports start training job with:
43
+
- PyTorch ML Framework. Version requirements: PyTorch >= 1.10
44
+
33
45
## Installation
34
46
35
47
1. Make sure that your local python version is 3.8, 3.9, 3.10 or 3.11.
* `region` (string) - Optional. The region that the SageMaker HyperPod and EKS clusters are located. If not specified, it will be set to the region from the current AWS account credentials.
99
111
* `clusters` (list[string]) - Optional. A list of SageMaker HyperPod cluster names that users want to check the capacity for. This is useful for users who know some of their most commonly used clusters and want to check the capacity status of the clusters in the AWS account.
100
112
* `orchestrator` (enum) - Optional. The orchestrator type for the cluster. Currently, `'eks'` is the only available option.
101
-
* `output` (enum) - Optional. The output format. Available values are `TABLE` and `JSON`. The default value is `JSON`.
113
+
* `output` (enum) - Optional. The output format. Available values are `table` and `json`. The default value is `json`.
* `job-name` (string) - Required. The name of the job.
124
-
* `job-kind` (string) - Optional. The training job kind. The job types currently supported are `kubeflow` and `PyTorchJob`.
136
+
* `job-kind` (string) - Optional. The training job kind. The job type currently supported is `kubeflow/PyTorchJob`.
125
137
* `namespace` (string) - Optional. The namespace to use. If not specified, this command uses the [Kubernetes namespace](https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/) of the Amazon EKS cluster associated with the SageMaker HyperPod cluster in your AWS account.
126
138
* `image` (string) - Required. The image used when creating the training job.
127
139
* `pull-policy` (enum) - Optional. The policy to pull the container image. Valid values are `Always`, `IfNotPresent`, and `Never`, as available from the PyTorchJob. The default is `Always`.
128
140
* `command` (string) - Optional. The command to run the entrypoint script. Currently, only `torchrun` is supported.
129
141
* `entry-script` (string) - Required. The path to the training script.
130
-
* `script-args` (list[string]) - Optional. The list of arguments for entryscripts.
142
+
* `script-args` (list[string]) - Optional. The list of arguments for entry scripts.
131
143
* `environment` (dict[string, string]) - Optional. The environment variables (key-value pairs) to set in the containers.
132
144
* `node-count` (int) - Required. The number of nodes (instances) to launch the jobs on.
133
145
* `instance-type` (string) - Required. The instance type to launch the job on. Note that the instance types you can use are the available instances within your SageMaker quotas for instances prefixed with `ml`.
0 commit comments