Skip to content

Commit 99cd278

Browse files
committed
Merge remote-tracking branch 'origin/master'
2 parents b793419 + 65482fa commit 99cd278

File tree

179 files changed

+5791
-1345
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

179 files changed

+5791
-1345
lines changed
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
name: Canaries
2+
on:
3+
schedule:
4+
- cron: "0 */3 * * *"
5+
workflow_dispatch:
6+
7+
permissions:
8+
id-token: write # This is required for requesting the JWT
9+
10+
jobs:
11+
tests:
12+
runs-on: ubuntu-latest
13+
steps:
14+
- name: Configure AWS Credentials
15+
uses: aws-actions/configure-aws-credentials@v4
16+
with:
17+
role-to-assume: ${{ secrets.CI_AWS_ROLE_ARN }}
18+
aws-region: us-west-2
19+
role-duration-seconds: 10800
20+
- name: Run Integ Tests
21+
uses: aws-actions/aws-codebuild-run-build@v1
22+
id: codebuild
23+
with:
24+
project-name: sagemaker-python-sdk-canaries

CHANGELOG.md

Lines changed: 126 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,131 @@
11
# Changelog
22

3+
## v2.242.0 (2025-03-14)
4+
5+
### Features
6+
7+
* add integ tests for training JumpStart models in private hub
8+
9+
### Bug Fixes and Other Changes
10+
11+
* Torch upgrade
12+
* Prevent RunContext overlap between test_run tests
13+
* remove s3 output location requirement from hub class init
14+
* Fixing Pytorch training python version in tests
15+
* update image_uri_configs 03-11-2025 07:18:09 PST
16+
* resolve infinite loop in _find_config on Windows systems
17+
* pipeline definition function doc update
18+
19+
## v2.241.0 (2025-03-06)
20+
21+
### Features
22+
23+
* Make DistributedConfig Extensible
24+
* support training for JumpStart model references as part of Curated Hub Phase 2
25+
* Allow ModelTrainer to accept hyperparameters file
26+
27+
### Bug Fixes and Other Changes
28+
29+
* Skip tests with deprecated instance type
30+
* Ensure Model.is_repack() returns a boolean
31+
* Fix error when there is no session to call _create_model_request()
32+
* Use sagemaker session's s3_resource in download_folder
33+
* Added check for the presence of model package group before creating one
34+
* Fix key error in _send_metrics()
35+
36+
## v2.240.0 (2025-02-25)
37+
38+
### Features
39+
40+
* Add support for TGI Neuronx 0.0.27 and HF PT 2.3.0 image in PySDK
41+
42+
### Bug Fixes and Other Changes
43+
44+
* Remove main function entrypoint in ModelBuilder dependency manager.
45+
* forbid extras in Configs
46+
* altconfig hubcontent and reenable integ test
47+
* Merge branch 'master-rba' into local_merge
48+
* py_version doc fixes
49+
* Add backward compatbility for RecordSerializer and RecordDeserializer
50+
* update image_uri_configs 02-21-2025 06:18:10 PST
51+
* update image_uri_configs 02-20-2025 06:18:08 PST
52+
53+
### Documentation Changes
54+
55+
* Removed a line about python version requirements of training script which can misguide users.
56+
57+
## v2.239.3 (2025-02-19)
58+
59+
### Bug Fixes and Other Changes
60+
61+
* added ap-southeast-7 and mx-central-1 for Jumpstart
62+
* update image_uri_configs 02-19-2025 06:18:15 PST
63+
64+
## v2.239.2 (2025-02-18)
65+
66+
### Bug Fixes and Other Changes
67+
68+
* Add warning about not supporting torch.nn.SyncBatchNorm
69+
* pass in inference_ami_version to model_based endpoint type
70+
* Fix hyperparameter strategy docs
71+
* Add framework_version to all TensorFlowModel examples
72+
* Move RecordSerializer and RecordDeserializer to sagemaker.serializers and sagemaker.deserialzers
73+
74+
## v2.239.1 (2025-02-14)
75+
76+
### Bug Fixes and Other Changes
77+
78+
* keep sagemaker_session from being overridden to None
79+
* Fix all type hint and docstrings for callable
80+
* Fix the workshop link for Step Functions
81+
* Fix Tensorflow doc link
82+
* Fix FeatureGroup docstring
83+
* Add type hint for ProcessingOutput
84+
* Fix sourcedir.tar.gz filenames in docstrings
85+
* Fix documentation for local mode
86+
* bug in get latest version was getting the max sorted alphabetically
87+
* Add cleanup logic to model builder integ tests for endpoints
88+
* Fixed pagination failing while listing collections
89+
* fix ValueError when updating a data quality monitoring schedule
90+
* Add docstring for image_uris.retrieve
91+
* Create GitHub action to trigger canaries
92+
* update image_uri_configs 02-04-2025 06:18:00 PST
93+
94+
## v2.239.0 (2025-02-01)
95+
96+
### Features
97+
98+
* Add support for deepseek recipes
99+
100+
### Bug Fixes and Other Changes
101+
102+
* mpirun protocol - distributed training with @remote decorator
103+
* Allow telemetry only in supported regions
104+
* Fix ssh host policy
105+
106+
## v2.238.0 (2025-01-29)
107+
108+
### Features
109+
110+
* use jumpstart deployment config image as default optimization image
111+
112+
### Bug Fixes and Other Changes
113+
114+
* chore: add new images for HF TGI
115+
* update image_uri_configs 01-29-2025 06:18:08 PST
116+
* skip TF tests for unsupported versions
117+
* Merge branch 'master-rba' into local_merge
118+
* Add missing attributes to local resourceconfig
119+
* update image_uri_configs 01-27-2025 06:18:13 PST
120+
* update image_uri_configs 01-24-2025 06:18:11 PST
121+
* add missing schema definition in docs
122+
* Omegaconf upgrade
123+
* SageMaker @remote function: Added multi-node functionality
124+
* remove option
125+
* fix typo
126+
* fix tests
127+
* Add an option for user to remove inputs and container artifacts when using local model trainer
128+
3129
## v2.237.3 (2025-01-09)
4130

5131
### Bug Fixes and Other Changes

CONTRIBUTING.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,10 @@ Before sending us a pull request, please ensure that:
6161
1. Follow the instructions at [Modifying an EBS Volume Using Elastic Volumes (Console)](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/requesting-ebs-volume-modifications.html#modify-ebs-volume) to increase the EBS volume size associated with the newly created EC2 instance.
6262
1. Wait 5-10min for the new EBS volume increase to finalize.
6363
1. Allow EC2 to claim the additional space by stopping and then starting your EC2 host.
64+
2. Set up a venv to manage dependencies:
65+
1. `python -m venv ~/.venv/myproject-env` to create the venv
66+
2. `source ~/.venv/myproject-env/bin/activate` to activate the venv
67+
3. `deactivate` to exit the venv
6468

6569

6670
### Pull Down the Code
@@ -74,8 +78,8 @@ Before sending us a pull request, please ensure that:
7478
### Run the Unit Tests
7579

7680
1. Install tox using `pip install tox`
77-
1. Install coverage using `pip install .[test]`
78-
1. cd into the sagemaker-python-sdk folder: `cd sagemaker-python-sdk` or `cd /environment/sagemaker-python-sdk`
81+
1. cd into the github project sagemaker-python-sdk folder: `cd sagemaker-python-sdk` or `cd /environment/sagemaker-python-sdk`
82+
1. Install coverage using `pip install '.[test]'`
7983
1. Run the following tox command and verify that all code checks and unit tests pass: `tox tests/unit`
8084
1. You can also run a single test with the following command: `tox -e py310 -- -s -vv <path_to_file><file_name>::<test_function_name>`
8185
1. You can run coverage via runcvoerage env : `tox -e runcoverage -- tests/unit` or `tox -e py310 -- tests/unit --cov=sagemaker --cov-append --cov-report xml`

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
2.237.4.dev0
1+
2.242.1.dev0

doc/frameworks/pytorch/using_pytorch.rst

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -28,8 +28,6 @@ To train a PyTorch model by using the SageMaker Python SDK:
2828
Prepare a PyTorch Training Script
2929
=================================
3030

31-
Your PyTorch training script must be a Python 3.6 compatible source file.
32-
3331
Prepare your script in a separate source file than the notebook, terminal session, or source file you're
3432
using to submit the script to SageMaker via a ``PyTorch`` Estimator. This will be discussed in further detail below.
3533

@@ -375,6 +373,9 @@ To initialize distributed training in your script, call
375373
`torch.distributed.init_process_group
376374
<https://pytorch.org/docs/master/distributed.html#torch.distributed.init_process_group>`_
377375
with the desired backend and the rank of the current host.
376+
Warning: Some torch features, such as (and likely not limited to) ``torch.nn.SyncBatchNorm``
377+
is not supported and its existence in ``init_process_group`` will cause an exception during
378+
distributed training.
378379
379380
.. code:: python
380381

doc/frameworks/tensorflow/deploying_tensorflow_serving.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ If you already have existing model artifacts in S3, you can skip training and de
6464
6565
from sagemaker.tensorflow import TensorFlowModel
6666
67-
model = TensorFlowModel(model_data='s3://mybucket/model.tar.gz', role='MySageMakerRole')
67+
model = TensorFlowModel(model_data='s3://mybucket/model.tar.gz', role='MySageMakerRole', framework_version='x.x.x')
6868
6969
predictor = model.deploy(initial_instance_count=1, instance_type='ml.c5.xlarge')
7070
@@ -74,7 +74,7 @@ Python-based TensorFlow serving on SageMaker has support for `Elastic Inference
7474
7575
from sagemaker.tensorflow import TensorFlowModel
7676
77-
model = TensorFlowModel(model_data='s3://mybucket/model.tar.gz', role='MySageMakerRole')
77+
model = TensorFlowModel(model_data='s3://mybucket/model.tar.gz', role='MySageMakerRole', framework_version='x.x.x')
7878
7979
predictor = model.deploy(initial_instance_count=1, instance_type='ml.c5.xlarge', accelerator_type='ml.eia1.medium')
8080

doc/frameworks/tensorflow/using_tf.rst

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -246,7 +246,7 @@ Training with parameter servers
246246

247247
If you specify parameter_server as the value of the distribution parameter, the container launches a parameter server
248248
thread on each instance in the training cluster, and then executes your training code. You can find more information on
249-
TensorFlow distributed training at `TensorFlow docs <https://www.tensorflow.org/deploy/distributed>`__.
249+
TensorFlow distributed training at `TensorFlow docs <https://www.tensorflow.org/guide/distributed_training>`__.
250250
To enable parameter server training:
251251

252252
.. code:: python
@@ -468,7 +468,7 @@ If you already have existing model artifacts in S3, you can skip training and de
468468
469469
from sagemaker.tensorflow import TensorFlowModel
470470
471-
model = TensorFlowModel(model_data='s3://mybucket/model.tar.gz', role='MySageMakerRole')
471+
model = TensorFlowModel(model_data='s3://mybucket/model.tar.gz', role='MySageMakerRole', framework_version='x.x.x')
472472
473473
predictor = model.deploy(initial_instance_count=1, instance_type='ml.c5.xlarge')
474474
@@ -478,7 +478,7 @@ Python-based TensorFlow serving on SageMaker has support for `Elastic Inference
478478
479479
from sagemaker.tensorflow import TensorFlowModel
480480
481-
model = TensorFlowModel(model_data='s3://mybucket/model.tar.gz', role='MySageMakerRole')
481+
model = TensorFlowModel(model_data='s3://mybucket/model.tar.gz', role='MySageMakerRole', framework_version='x.x.x')
482482
483483
predictor = model.deploy(initial_instance_count=1, instance_type='ml.c5.xlarge', accelerator_type='ml.eia1.medium')
484484
@@ -767,7 +767,8 @@ This customized Python code must be named ``inference.py`` and is specified thro
767767
768768
model = TensorFlowModel(entry_point='inference.py',
769769
model_data='s3://mybucket/model.tar.gz',
770-
role='MySageMakerRole')
770+
role='MySageMakerRole',
771+
framework_version='x.x.x')
771772
772773
In the example above, ``inference.py`` is assumed to be a file inside ``model.tar.gz``. If you want to use a local file instead, you must add the ``source_dir`` argument. See the documentation on `TensorFlowModel <https://sagemaker.readthedocs.io/en/stable/frameworks/tensorflow/sagemaker.tensorflow.html#sagemaker.tensorflow.model.TensorFlowModel>`_.
773774

@@ -923,7 +924,8 @@ processing. There are 2 ways to do this:
923924
model = TensorFlowModel(entry_point='inference.py',
924925
dependencies=['requirements.txt'],
925926
model_data='s3://mybucket/model.tar.gz',
926-
role='MySageMakerRole')
927+
role='MySageMakerRole',
928+
framework_version='x.x.x')
927929
928930
929931
2. If you are working in a network-isolation situation or if you don't
@@ -941,7 +943,8 @@ processing. There are 2 ways to do this:
941943
model = TensorFlowModel(entry_point='inference.py',
942944
dependencies=['/path/to/folder/named/lib'],
943945
model_data='s3://mybucket/model.tar.gz',
944-
role='MySageMakerRole')
946+
role='MySageMakerRole',
947+
framework_version='x.x.x')
945948
946949
For more information, see: https://github.com/aws/sagemaker-tensorflow-serving-container#prepost-processing
947950

doc/overview.rst

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,11 @@ To train a model by using the SageMaker Python SDK, you:
3030

3131
After you train a model, you can save it, and then serve the model as an endpoint to get real-time inferences or get inferences for an entire dataset by using batch transform.
3232

33+
34+
Important Note:
35+
36+
* When using torch to load Models, it is recommended to use version torch>=2.6.0 and torchvision>=0.17.0
37+
3338
Prepare a Training script
3439
=========================
3540

@@ -1958,15 +1963,15 @@ Make sure to have a Compose Version compatible with your Docker Engine installat
19581963
Local mode configuration
19591964
========================
19601965

1961-
The local mode uses a YAML configuration file located at ``~/.sagemaker/config.yaml`` to define the default values that are automatically passed to the ``config`` attribute of ``LocalSession``. This is an example of the configuration, for the full schema, see `sagemaker.config.config_schema.SAGEMAKER_PYTHON_SDK_LOCAL_MODE_CONFIG_SCHEMA <https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/config/config_schema.py>`_.
1966+
The local mode uses a YAML configuration file located at ``${user_config_directory}/sagemaker/config.yaml`` to define the default values that are automatically passed to the ``config`` attribute of ``LocalSession``. This is an example of the configuration, for the full schema, see `sagemaker.config.config_schema.SAGEMAKER_PYTHON_SDK_LOCAL_MODE_CONFIG_SCHEMA <https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/config/config_schema.py>`_.
19621967

19631968
.. code:: yaml
19641969
19651970
local:
19661971
local_code: true # Using everything locally
19671972
region_name: "us-west-2" # Name of the region
19681973
container_config: # Additional docker container config
1969-
shm_size: "128M
1974+
shm_size: "128M"
19701975
19711976
If you want to keep everything local, and not use Amazon S3 either, you can enable "local code" in one of two ways:
19721977

@@ -2565,6 +2570,9 @@ set default values for. For the full schema, see `sagemaker.config.config_schema
25652570
      KmsKeyId: 'kmskeyid10'
25662571
    TransformResources:
25672572
      VolumeKmsKeyId: 'volumekmskeyid4'
2573+
Tags:
2574+
    - Key: 'tag_key'
2575+
      Value: 'tag_value
25682576
  CompilationJob:
25692577
  # https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateCompilationJob.html
25702578
    OutputConfig:

doc/v2.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -324,9 +324,9 @@ The follow serializer/deserializer classes have been renamed and/or moved:
324324
+--------------------------------------------------------+-------------------------------------------------------+
325325
| ``sagemaker.predictor._NPYSerializer`` | ``sagemaker.serializers.NumpySerializer`` |
326326
+--------------------------------------------------------+-------------------------------------------------------+
327-
| ``sagemaker.amazon.common.numpy_to_record_serializer`` | ``sagemaker.amazon.common.RecordSerializer`` |
327+
| ``sagemaker.amazon.common.numpy_to_record_serializer`` | ``sagemaker.serializers.RecordSerializer`` |
328328
+--------------------------------------------------------+-------------------------------------------------------+
329-
| ``sagemaker.amazon.common.record_deserializer`` | ``sagemaker.amazon.common.RecordDeserializer`` |
329+
| ``sagemaker.amazon.common.record_deserializer`` | ``sagemaker.deserializers.RecordDeserializer`` |
330330
+--------------------------------------------------------+-------------------------------------------------------+
331331
| ``sagemaker.predictor._JsonDeserializer`` | ``sagemaker.deserializers.JSONDeserializer`` |
332332
+--------------------------------------------------------+-------------------------------------------------------+

doc/workflows/step_functions/index.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,5 +11,5 @@ without having to provision and integrate the AWS services separately.
1111
The AWS Step Functions Python SDK uses the SageMaker Python SDK as a dependency.
1212
To get started with step functions, try the workshop or visit the SDK's website:
1313

14-
* `Workshop on using AWS Step Functions with SageMaker <https://www.sagemakerworkshop.com/step/>`__
14+
* `Create and manage Amazon SageMaker AI jobs with Step Functions <https://docs.aws.amazon.com/step-functions/latest/dg/connect-sagemaker.html>`__
1515
* `AWS Step Functions Python SDK website <https://aws-step-functions-data-science-sdk.readthedocs.io/en/stable/>`__

0 commit comments

Comments
 (0)