Skip to content

Master albatross smd #5141

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 64 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
64 commits
Select commit Hold shift + click to select a range
60732d3
change: update image_uri_configs 01-27-2025 06:18:13 PST
sagemaker-bot Jan 27, 2025
eb3a774
fix: skip TF tests for unsupported versions (#5007)
benieric Jan 28, 2025
ebcd26f
change: update image_uri_configs 01-29-2025 06:18:08 PST
sagemaker-bot Jan 29, 2025
0772ecd
chore: add new images for HF TGI (#5005)
varunmoris Jan 29, 2025
ae03c31
feat: use jumpstart deployment config image as default optimization i…
gwang111 Jan 29, 2025
401fc81
prepare release v2.238.0
Jan 29, 2025
71f6d22
update development version to v2.238.1.dev0
Jan 29, 2025
1328e69
Fix ssh host policy (#4966)
sage-maker Jan 30, 2025
138a2e9
change: Allow telemetry only in supported regions (#5009)
rsareddy0329 Jan 31, 2025
caaf47e
mpirun protocol - distributed training with @remote decorator (#4998)
brunopistone Jan 31, 2025
4533790
feat: Add support for deepseek recipes (#5011)
benieric Jan 31, 2025
75f3295
prepare release v2.239.0
Feb 1, 2025
352b922
update development version to v2.239.1.dev0
Feb 1, 2025
8910496
change: update image_uri_configs 02-04-2025 06:18:00 PST
sagemaker-bot Feb 4, 2025
9cb2415
Create GitHub action to trigger canaries (#5008)
nileshvd Feb 4, 2025
87e25c0
Add docstring for image_uris.retrieve
Feb 5, 2025
7f5439f
fix: fix ValueError when updating a data quality monitoring schedule …
luke-gerschwitz Feb 7, 2025
80d3c02
Fixed pagination failing while listing collections (#5020)
keshav-chandak Feb 7, 2025
4073bce
Add cleanup logic to model builder integ tests for endpoints (#5022)
sage-maker Feb 10, 2025
a8e225e
fix: bug in get latest version was getting the max sorted alphabetica…
e-davidson Feb 10, 2025
c7b4b72
Fix documentation for local mode (#5026)
pintaoz-aws Feb 10, 2025
86dd6ae
Fix sourcedir.tar.gz filenames in docstrings (#5019)
pintaoz-aws Feb 10, 2025
2948ae3
Add type hint for ProcessingOutput (#5030)
pintaoz-aws Feb 11, 2025
31c5e0a
Fix FeatureGroup docstring (#5028)
pintaoz-aws Feb 11, 2025
fc0e7d0
Fix Tensorflow doc link (#5029)
pintaoz-aws Feb 11, 2025
b6e15be
Fix the workshop link for Step Functions (#5034)
pintaoz-aws Feb 13, 2025
6c5e222
Fix all type hint and docstrings for callable (#5035)
pintaoz-aws Feb 13, 2025
0f5054e
fix: keep sagemaker_session from being overridden to None (#5021)
Narrohag Feb 13, 2025
f7cd6d1
prepare release v2.239.1
Feb 14, 2025
b0e17cf
update development version to v2.239.2.dev0
Feb 14, 2025
35ddf9c
Move RecordSerializer and RecordDeserializer to sagemaker.serializers…
pintaoz-aws Feb 17, 2025
b485f94
Add framework_version to all TensorFlowModel examples (#5038)
pintaoz-aws Feb 17, 2025
4da15cd
Fix hyperparameter strategy docs (#5045)
sage-maker Feb 18, 2025
01c72c7
fix: pass in inference_ami_version to model_based endpoint type (#5043)
timkuo-amazon Feb 18, 2025
35acb3a
Add warning about not supporting torch.nn.SyncBatchNorm (#5046)
pintaoz-aws Feb 18, 2025
45263fe
prepare release v2.239.2
Feb 18, 2025
b6ddeee
update development version to v2.239.3.dev0
Feb 18, 2025
642a2ca
change: update image_uri_configs 02-19-2025 06:18:15 PST
sagemaker-bot Feb 19, 2025
dc8d350
change: added ap-southeast-7 and mx-central-1 for Jumpstart (#5049)
IshaChid76 Feb 19, 2025
0757e9d
prepare release v2.239.3
Feb 19, 2025
a9583d5
update development version to v2.239.4.dev0
Feb 19, 2025
dfad50d
change: update image_uri_configs 02-20-2025 06:18:08 PST
sagemaker-bot Feb 20, 2025
122ea28
feat: Add support for TGI Neuronx 0.0.27 and HF PT 2.3.0 image in PyS…
malav-shastri Feb 20, 2025
7a8635b
Add backward compatbility for RecordSerializer and RecordDeserializer…
pintaoz-aws Feb 21, 2025
dc8c305
py_version doc fixes (#5048)
sage-maker Feb 23, 2025
01dff74
change: update image_uri_configs 02-21-2025 06:18:10 PST
sagemaker-bot Feb 21, 2025
945db32
fix: altconfig hubcontent and reenable integ test (#5051)
bencrabtree Feb 24, 2025
f3dab1e
fix: forbid extras in Configs (#5042)
benieric Feb 24, 2025
7829a7e
Remove main function entrypoint in ModelBuilder dependency manager. (…
cj-zhang Feb 25, 2025
eec6e15
documentation: Removed a line about python version requirements of tr…
rsareddy0329 Feb 25, 2025
6f09793
prepare release v2.240.0
Feb 25, 2025
a125d17
update development version to v2.240.1.dev0
Feb 25, 2025
f5330fc
Fix key error in _send_metrics() (#5068)
pintaoz-aws Feb 28, 2025
9c5b657
fix: Added check for the presence of model package group before creat…
keshav-chandak Feb 28, 2025
85c66d7
Use sagemaker session's s3_resource in download_folder (#5064)
pintaoz-aws Mar 3, 2025
28bd3c4
Fix error when there is no session to call _create_model_request() (#…
pintaoz-aws Mar 5, 2025
4a02f6c
Ensure Model.is_repack() returns a boolean (#5060)
pintaoz-aws Mar 5, 2025
b9ff4ad
feat: Allow ModelTrainer to accept hyperparameters file (#5059)
benieric Mar 5, 2025
0c73ce0
feature: support training for JumpStart model references as part of C…
Narrohag Mar 5, 2025
24d8ef1
feat: Make DistributedConfig Extensible (#5039)
benieric Mar 5, 2025
35484d2
Skip tests with deprecated instance type (#5077)
pintaoz-aws Mar 6, 2025
db42dc5
feature:support custom workflow deployment in ModelBuilder using SMD …
cj-zhang Mar 12, 2025
60dd883
Cache client as instance attribute in property@ decorator. (#1668)
cj-zhang Apr 3, 2025
4c3ac4c
Bugfixes from e2e testing. (#1670)
cj-zhang Apr 16, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions .github/workflows/codebuild-canaries.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
name: Canaries
on:
schedule:
- cron: "0 */3 * * *"
workflow_dispatch:

permissions:
id-token: write # This is required for requesting the JWT

jobs:
tests:
runs-on: ubuntu-latest
steps:
- name: Configure AWS Credentials
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: ${{ secrets.CI_AWS_ROLE_ARN }}
aws-region: us-west-2
role-duration-seconds: 10800
- name: Run Integ Tests
uses: aws-actions/aws-codebuild-run-build@v1
id: codebuild
with:
project-name: sagemaker-python-sdk-canaries
93 changes: 93 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,98 @@
# Changelog

## v2.240.0 (2025-02-25)

### Features

* Add support for TGI Neuronx 0.0.27 and HF PT 2.3.0 image in PySDK

### Bug Fixes and Other Changes

* Remove main function entrypoint in ModelBuilder dependency manager.
* forbid extras in Configs
* altconfig hubcontent and reenable integ test
* Merge branch 'master-rba' into local_merge
* py_version doc fixes
* Add backward compatbility for RecordSerializer and RecordDeserializer
* update image_uri_configs 02-21-2025 06:18:10 PST
* update image_uri_configs 02-20-2025 06:18:08 PST

### Documentation Changes

* Removed a line about python version requirements of training script which can misguide users.

## v2.239.3 (2025-02-19)

### Bug Fixes and Other Changes

* added ap-southeast-7 and mx-central-1 for Jumpstart
* update image_uri_configs 02-19-2025 06:18:15 PST

## v2.239.2 (2025-02-18)

### Bug Fixes and Other Changes

* Add warning about not supporting torch.nn.SyncBatchNorm
* pass in inference_ami_version to model_based endpoint type
* Fix hyperparameter strategy docs
* Add framework_version to all TensorFlowModel examples
* Move RecordSerializer and RecordDeserializer to sagemaker.serializers and sagemaker.deserialzers

## v2.239.1 (2025-02-14)

### Bug Fixes and Other Changes

* keep sagemaker_session from being overridden to None
* Fix all type hint and docstrings for callable
* Fix the workshop link for Step Functions
* Fix Tensorflow doc link
* Fix FeatureGroup docstring
* Add type hint for ProcessingOutput
* Fix sourcedir.tar.gz filenames in docstrings
* Fix documentation for local mode
* bug in get latest version was getting the max sorted alphabetically
* Add cleanup logic to model builder integ tests for endpoints
* Fixed pagination failing while listing collections
* fix ValueError when updating a data quality monitoring schedule
* Add docstring for image_uris.retrieve
* Create GitHub action to trigger canaries
* update image_uri_configs 02-04-2025 06:18:00 PST

## v2.239.0 (2025-02-01)

### Features

* Add support for deepseek recipes

### Bug Fixes and Other Changes

* mpirun protocol - distributed training with @remote decorator
* Allow telemetry only in supported regions
* Fix ssh host policy

## v2.238.0 (2025-01-29)

### Features

* use jumpstart deployment config image as default optimization image

### Bug Fixes and Other Changes

* chore: add new images for HF TGI
* update image_uri_configs 01-29-2025 06:18:08 PST
* skip TF tests for unsupported versions
* Merge branch 'master-rba' into local_merge
* Add missing attributes to local resourceconfig
* update image_uri_configs 01-27-2025 06:18:13 PST
* update image_uri_configs 01-24-2025 06:18:11 PST
* add missing schema definition in docs
* Omegaconf upgrade
* SageMaker @remote function: Added multi-node functionality
* remove option
* fix typo
* fix tests
* Add an option for user to remove inputs and container artifacts when using local model trainer

## v2.237.3 (2025-01-09)

### Bug Fixes and Other Changes
Expand Down
8 changes: 6 additions & 2 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,10 @@ Before sending us a pull request, please ensure that:
1. Follow the instructions at [Modifying an EBS Volume Using Elastic Volumes (Console)](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/requesting-ebs-volume-modifications.html#modify-ebs-volume) to increase the EBS volume size associated with the newly created EC2 instance.
1. Wait 5-10min for the new EBS volume increase to finalize.
1. Allow EC2 to claim the additional space by stopping and then starting your EC2 host.
2. Set up a venv to manage dependencies:
1. `python -m venv ~/.venv/myproject-env` to create the venv
2. `source ~/.venv/myproject-env/bin/activate` to activate the venv
3. `deactivate` to exit the venv


### Pull Down the Code
Expand All @@ -74,8 +78,8 @@ Before sending us a pull request, please ensure that:
### Run the Unit Tests

1. Install tox using `pip install tox`
1. Install coverage using `pip install .[test]`
1. cd into the sagemaker-python-sdk folder: `cd sagemaker-python-sdk` or `cd /environment/sagemaker-python-sdk`
1. cd into the github project sagemaker-python-sdk folder: `cd sagemaker-python-sdk` or `cd /environment/sagemaker-python-sdk`
1. Install coverage using `pip install '.[test]'`
1. Run the following tox command and verify that all code checks and unit tests pass: `tox tests/unit`
1. You can also run a single test with the following command: `tox -e py310 -- -s -vv <path_to_file><file_name>::<test_function_name>`
1. You can run coverage via runcvoerage env : `tox -e runcoverage -- tests/unit` or `tox -e py310 -- tests/unit --cov=sagemaker --cov-append --cov-report xml`
Expand Down
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
2.237.4.dev0
2.240.1.dev0
5 changes: 3 additions & 2 deletions doc/frameworks/pytorch/using_pytorch.rst
Original file line number Diff line number Diff line change
Expand Up @@ -28,8 +28,6 @@ To train a PyTorch model by using the SageMaker Python SDK:
Prepare a PyTorch Training Script
=================================

Your PyTorch training script must be a Python 3.6 compatible source file.

Prepare your script in a separate source file than the notebook, terminal session, or source file you're
using to submit the script to SageMaker via a ``PyTorch`` Estimator. This will be discussed in further detail below.

Expand Down Expand Up @@ -375,6 +373,9 @@ To initialize distributed training in your script, call
`torch.distributed.init_process_group
<https://pytorch.org/docs/master/distributed.html#torch.distributed.init_process_group>`_
with the desired backend and the rank of the current host.
Warning: Some torch features, such as (and likely not limited to) ``torch.nn.SyncBatchNorm``
is not supported and its existence in ``init_process_group`` will cause an exception during
distributed training.

.. code:: python

Expand Down
4 changes: 2 additions & 2 deletions doc/frameworks/tensorflow/deploying_tensorflow_serving.rst
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ If you already have existing model artifacts in S3, you can skip training and de

from sagemaker.tensorflow import TensorFlowModel

model = TensorFlowModel(model_data='s3://mybucket/model.tar.gz', role='MySageMakerRole')
model = TensorFlowModel(model_data='s3://mybucket/model.tar.gz', role='MySageMakerRole', framework_version='x.x.x')

predictor = model.deploy(initial_instance_count=1, instance_type='ml.c5.xlarge')

Expand All @@ -74,7 +74,7 @@ Python-based TensorFlow serving on SageMaker has support for `Elastic Inference

from sagemaker.tensorflow import TensorFlowModel

model = TensorFlowModel(model_data='s3://mybucket/model.tar.gz', role='MySageMakerRole')
model = TensorFlowModel(model_data='s3://mybucket/model.tar.gz', role='MySageMakerRole', framework_version='x.x.x')

predictor = model.deploy(initial_instance_count=1, instance_type='ml.c5.xlarge', accelerator_type='ml.eia1.medium')

Expand Down
15 changes: 9 additions & 6 deletions doc/frameworks/tensorflow/using_tf.rst
Original file line number Diff line number Diff line change
Expand Up @@ -246,7 +246,7 @@ Training with parameter servers

If you specify parameter_server as the value of the distribution parameter, the container launches a parameter server
thread on each instance in the training cluster, and then executes your training code. You can find more information on
TensorFlow distributed training at `TensorFlow docs <https://www.tensorflow.org/deploy/distributed>`__.
TensorFlow distributed training at `TensorFlow docs <https://www.tensorflow.org/guide/distributed_training>`__.
To enable parameter server training:

.. code:: python
Expand Down Expand Up @@ -468,7 +468,7 @@ If you already have existing model artifacts in S3, you can skip training and de

from sagemaker.tensorflow import TensorFlowModel

model = TensorFlowModel(model_data='s3://mybucket/model.tar.gz', role='MySageMakerRole')
model = TensorFlowModel(model_data='s3://mybucket/model.tar.gz', role='MySageMakerRole', framework_version='x.x.x')

predictor = model.deploy(initial_instance_count=1, instance_type='ml.c5.xlarge')

Expand All @@ -478,7 +478,7 @@ Python-based TensorFlow serving on SageMaker has support for `Elastic Inference

from sagemaker.tensorflow import TensorFlowModel

model = TensorFlowModel(model_data='s3://mybucket/model.tar.gz', role='MySageMakerRole')
model = TensorFlowModel(model_data='s3://mybucket/model.tar.gz', role='MySageMakerRole', framework_version='x.x.x')

predictor = model.deploy(initial_instance_count=1, instance_type='ml.c5.xlarge', accelerator_type='ml.eia1.medium')

Expand Down Expand Up @@ -767,7 +767,8 @@ This customized Python code must be named ``inference.py`` and is specified thro

model = TensorFlowModel(entry_point='inference.py',
model_data='s3://mybucket/model.tar.gz',
role='MySageMakerRole')
role='MySageMakerRole',
framework_version='x.x.x')

In the example above, ``inference.py`` is assumed to be a file inside ``model.tar.gz``. If you want to use a local file instead, you must add the ``source_dir`` argument. See the documentation on `TensorFlowModel <https://sagemaker.readthedocs.io/en/stable/frameworks/tensorflow/sagemaker.tensorflow.html#sagemaker.tensorflow.model.TensorFlowModel>`_.

Expand Down Expand Up @@ -923,7 +924,8 @@ processing. There are 2 ways to do this:
model = TensorFlowModel(entry_point='inference.py',
dependencies=['requirements.txt'],
model_data='s3://mybucket/model.tar.gz',
role='MySageMakerRole')
role='MySageMakerRole',
framework_version='x.x.x')


2. If you are working in a network-isolation situation or if you don't
Expand All @@ -941,7 +943,8 @@ processing. There are 2 ways to do this:
model = TensorFlowModel(entry_point='inference.py',
dependencies=['/path/to/folder/named/lib'],
model_data='s3://mybucket/model.tar.gz',
role='MySageMakerRole')
role='MySageMakerRole',
framework_version='x.x.x')

For more information, see: https://github.com/aws/sagemaker-tensorflow-serving-container#prepost-processing

Expand Down
4 changes: 2 additions & 2 deletions doc/overview.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1958,15 +1958,15 @@ Make sure to have a Compose Version compatible with your Docker Engine installat
Local mode configuration
========================

The local mode uses a YAML configuration file located at ``~/.sagemaker/config.yaml`` to define the default values that are automatically passed to the ``config`` attribute of ``LocalSession``. This is an example of the configuration, for the full schema, see `sagemaker.config.config_schema.SAGEMAKER_PYTHON_SDK_LOCAL_MODE_CONFIG_SCHEMA <https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/config/config_schema.py>`_.
The local mode uses a YAML configuration file located at ``${user_config_directory}/sagemaker/config.yaml`` to define the default values that are automatically passed to the ``config`` attribute of ``LocalSession``. This is an example of the configuration, for the full schema, see `sagemaker.config.config_schema.SAGEMAKER_PYTHON_SDK_LOCAL_MODE_CONFIG_SCHEMA <https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/config/config_schema.py>`_.

.. code:: yaml

local:
local_code: true # Using everything locally
region_name: "us-west-2" # Name of the region
container_config: # Additional docker container config
shm_size: "128M
shm_size: "128M"

If you want to keep everything local, and not use Amazon S3 either, you can enable "local code" in one of two ways:

Expand Down
4 changes: 2 additions & 2 deletions doc/v2.rst
Original file line number Diff line number Diff line change
Expand Up @@ -324,9 +324,9 @@ The follow serializer/deserializer classes have been renamed and/or moved:
+--------------------------------------------------------+-------------------------------------------------------+
| ``sagemaker.predictor._NPYSerializer`` | ``sagemaker.serializers.NumpySerializer`` |
+--------------------------------------------------------+-------------------------------------------------------+
| ``sagemaker.amazon.common.numpy_to_record_serializer`` | ``sagemaker.amazon.common.RecordSerializer`` |
| ``sagemaker.amazon.common.numpy_to_record_serializer`` | ``sagemaker.serializers.RecordSerializer`` |
+--------------------------------------------------------+-------------------------------------------------------+
| ``sagemaker.amazon.common.record_deserializer`` | ``sagemaker.amazon.common.RecordDeserializer`` |
| ``sagemaker.amazon.common.record_deserializer`` | ``sagemaker.deserializers.RecordDeserializer`` |
+--------------------------------------------------------+-------------------------------------------------------+
| ``sagemaker.predictor._JsonDeserializer`` | ``sagemaker.deserializers.JSONDeserializer`` |
+--------------------------------------------------------+-------------------------------------------------------+
Expand Down
2 changes: 1 addition & 1 deletion doc/workflows/step_functions/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,5 +11,5 @@ without having to provision and integrate the AWS services separately.
The AWS Step Functions Python SDK uses the SageMaker Python SDK as a dependency.
To get started with step functions, try the workshop or visit the SDK's website:

* `Workshop on using AWS Step Functions with SageMaker <https://www.sagemakerworkshop.com/step/>`__
* `Create and manage Amazon SageMaker AI jobs with Step Functions <https://docs.aws.amazon.com/step-functions/latest/dg/connect-sagemaker.html>`__
* `AWS Step Functions Python SDK website <https://aws-step-functions-data-science-sdk.readthedocs.io/en/stable/>`__
Loading
Loading