Skip to content

[Feature]: Deprecate and remove update_neuron_sdk.sh lifecycle script #973

@KeitaW

Description

@KeitaW

Feature Description & Motivation

The update_neuron_sdk.sh lifecycle script pins to Neuron SDK Release 2.21.0, which is significantly outdated (current release is 2.27.1+). The pinned package versions are:

  • aws-neuronx-dkms=2.19.64.0
  • aws-neuronx-oci-hook=2.6.36.0
  • aws-neuronx-runtime-lib=2.23.110.0
  • aws-neuronx-collectives=2.23.133.0
  • aws-neuronx-tools=2.20.204.0

This script is unnecessary because HyperPod Slurm AMIs already ship with the Neuron SDK preinstalled and the SDK is automatically updated when users call the UpdateClusterSoftware API. See the HyperPod Slurm AMI release notes for the versions included in each AMI release.

Rather than continuously updating pinned versions in this script, it should be deprecated and removed to simplify the lifecycle scripts. Users who need a specific Neuron SDK version should rely on the preinstalled AMI version or the UpdateClusterSoftware API.

Related: #875

Files to Change

  1. Delete 1.architectures/5.sagemaker-hyperpod/LifecycleScripts/base-config/utils/update_neuron_sdk.sh
  2. Remove the enable_update_neuron_sdk config flag in 1.architectures/5.sagemaker-hyperpod/LifecycleScripts/base-config/config.py (line 20)
  3. Remove the conditional block that calls update_neuron_sdk.sh in 1.architectures/5.sagemaker-hyperpod/LifecycleScripts/base-config/lifecycle_script.py (lines 262-265)
  4. Remove the sed command that enables the flag in 1.architectures/5.sagemaker-hyperpod/automate-smhp-slurm/automate-cluster-creation.sh (line 355)

Category

Enhancement to existing test case

Additional Context

Reviewer requirement: Because this change touches SageMaker HyperPod lifecycle scripts, the fix PR will require SageMaker service team review. Contributors should assign the PR to hyperpod-lcs-dev for review.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions