Skip to content

Commit 9172015

Browse files
authored
fix: adding Platform Support and ML Framework Support sections in the README; fixing some typos in the README (#30)
1 parent 66a1934 commit 9172015

File tree

3 files changed

+19
-6
lines changed

3 files changed

+19
-6
lines changed

README.md

Lines changed: 15 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,9 @@ This documentation serves as a reference for the available HyperPod CLI commands
77

88
## Table of Contents
99
- [Overview](#overview)
10+
- [Prerequisites](#prerequisites)
11+
- [Platform Support](#platform-support)
12+
- [ML Framework Support](#ml-framework-support)
1013
- [Installation](#installation)
1114
- [Usage](#usage)
1215
- [Listing Clusters](#listing-clusters)
@@ -30,6 +33,15 @@ The SageMaker HyperPod CLI is a tool that helps submit training jobs to the Amaz
3033
- Or you can follow the [Readme under helm_chart folder](https://github.com/aws/sagemaker-hyperpod-cli/blob/main/helm_chart/readme.md) to install Kubeflow Training Operator.
3134
- Configure [aws cli](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-started.html) to point to the correct region where your HyperPod clusters are located.
3235

36+
## Platform Support
37+
38+
SageMaker HyperPod CLI currently supports Linux and MacOS platforms. Windows platform is not supported now.
39+
40+
## ML Framework Support
41+
42+
SageMaker HyperPod CLI currently supports start training job with:
43+
- PyTorch ML Framework. Version requirements: PyTorch >= 1.10
44+
3345
## Installation
3446

3547
1. Make sure that your local python version is 3.8, 3.9, 3.10 or 3.11.
@@ -98,7 +110,7 @@ hyperpod get-clusters [--region <region>] [--clusters <cluster1,cluster2>] [--or
98110
* `region` (string) - Optional. The region that the SageMaker HyperPod and EKS clusters are located. If not specified, it will be set to the region from the current AWS account credentials.
99111
* `clusters` (list[string]) - Optional. A list of SageMaker HyperPod cluster names that users want to check the capacity for. This is useful for users who know some of their most commonly used clusters and want to check the capacity status of the clusters in the AWS account.
100112
* `orchestrator` (enum) - Optional. The orchestrator type for the cluster. Currently, `'eks'` is the only available option.
101-
* `output` (enum) - Optional. The output format. Available values are `TABLE` and `JSON`. The default value is `JSON`.
113+
* `output` (enum) - Optional. The output format. Available values are `table` and `json`. The default value is `json`.
102114
103115
### Connecting to a Cluster
104116
@@ -121,13 +133,13 @@ hyperpod start-job --job-name <job-name> [--namespace <namespace>] [--job-kind <
121133
```
122134
123135
* `job-name` (string) - Required. The name of the job.
124-
* `job-kind` (string) - Optional. The training job kind. The job types currently supported are `kubeflow` and `PyTorchJob`.
136+
* `job-kind` (string) - Optional. The training job kind. The job type currently supported is `kubeflow/PyTorchJob`.
125137
* `namespace` (string) - Optional. The namespace to use. If not specified, this command uses the [Kubernetes namespace](https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/) of the Amazon EKS cluster associated with the SageMaker HyperPod cluster in your AWS account.
126138
* `image` (string) - Required. The image used when creating the training job.
127139
* `pull-policy` (enum) - Optional. The policy to pull the container image. Valid values are `Always`, `IfNotPresent`, and `Never`, as available from the PyTorchJob. The default is `Always`.
128140
* `command` (string) - Optional. The command to run the entrypoint script. Currently, only `torchrun` is supported.
129141
* `entry-script` (string) - Required. The path to the training script.
130-
* `script-args` (list[string]) - Optional. The list of arguments for entryscripts.
142+
* `script-args` (list[string]) - Optional. The list of arguments for entry scripts.
131143
* `environment` (dict[string, string]) - Optional. The environment variables (key-value pairs) to set in the containers.
132144
* `node-count` (int) - Required. The number of nodes (instances) to launch the jobs on.
133145
* `instance-type` (string) - Required. The instance type to launch the job on. Note that the instance types you can use are the available instances within your SageMaker quotas for instances prefixed with `ml`.

src/hyperpod_cli/clients/kubernetes_client.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,9 @@ def __new__(cls, is_get_capacity: bool = False) -> "KubernetesClient":
4646
if cls._instance is None:
4747
cls._instance = super(KubernetesClient, cls).__new__(cls)
4848
config.load_kube_config(
49-
config_file=KUBE_CONFIG_PATH if not is_get_capacity else TEMP_KUBE_CONFIG_FILE
49+
config_file=KUBE_CONFIG_PATH
50+
if not is_get_capacity
51+
else TEMP_KUBE_CONFIG_FILE
5052
) # or config.load_incluster_config() for in-cluster config
5153
cls._instance._kube_client = client.ApiClient()
5254
return cls._instance

src/hyperpod_cli/validators/job_validator.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@
2020
RestartPolicy,
2121
KUEUE_QUEUE_NAME_LABEL_KEY,
2222
HYPERPOD_AUTO_RESUME_ANNOTATION_KEY,
23-
HYPERPOD_MAX_RETRY_ANNOTATION_KEY
23+
HYPERPOD_MAX_RETRY_ANNOTATION_KEY,
2424
)
2525
from hyperpod_cli.constants.hyperpod_instance_types import (
2626
HyperpodInstanceType,
@@ -275,4 +275,3 @@ def _validate_json_str(
275275
# Catch any other exceptions
276276
logger.error(f"An unexpected error occurred: {e}")
277277
return False
278-

0 commit comments

Comments
 (0)