-
Notifications
You must be signed in to change notification settings - Fork 165
Description
Feature Description & Motivation
The update_neuron_sdk.sh lifecycle script pins to Neuron SDK Release 2.21.0, which is significantly outdated (current release is 2.27.1+). The pinned package versions are:
aws-neuronx-dkms=2.19.64.0aws-neuronx-oci-hook=2.6.36.0aws-neuronx-runtime-lib=2.23.110.0aws-neuronx-collectives=2.23.133.0aws-neuronx-tools=2.20.204.0
This script is unnecessary because HyperPod Slurm AMIs already ship with the Neuron SDK preinstalled and the SDK is automatically updated when users call the UpdateClusterSoftware API. See the HyperPod Slurm AMI release notes for the versions included in each AMI release.
Rather than continuously updating pinned versions in this script, it should be deprecated and removed to simplify the lifecycle scripts. Users who need a specific Neuron SDK version should rely on the preinstalled AMI version or the UpdateClusterSoftware API.
Related: #875
Files to Change
- Delete
1.architectures/5.sagemaker-hyperpod/LifecycleScripts/base-config/utils/update_neuron_sdk.sh - Remove the
enable_update_neuron_sdkconfig flag in1.architectures/5.sagemaker-hyperpod/LifecycleScripts/base-config/config.py(line 20) - Remove the conditional block that calls
update_neuron_sdk.shin1.architectures/5.sagemaker-hyperpod/LifecycleScripts/base-config/lifecycle_script.py(lines 262-265) - Remove the
sedcommand that enables the flag in1.architectures/5.sagemaker-hyperpod/automate-smhp-slurm/automate-cluster-creation.sh(line 355)
Category
Enhancement to existing test case
Additional Context
Reviewer requirement: Because this change touches SageMaker HyperPod lifecycle scripts, the fix PR will require SageMaker service team review. Contributors should assign the PR to hyperpod-lcs-dev for review.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status