Skip to content

Commit cd837e4

Browse files
authored
Merge branch 'master' into processing-job-codeartifact-support
2 parents 077dcdf + 8462f1a commit cd837e4

File tree

105 files changed

+8021
-1859
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

105 files changed

+8021
-1859
lines changed

CHANGELOG.md

Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,89 @@
11
# Changelog
22

3+
## v2.197.0 (2023-11-07)
4+
5+
### Features
6+
7+
* PT2.1 SM Training/Inference DLC Release
8+
9+
### Bug Fixes and Other Changes
10+
11+
* Release HuggingFace PT Neuronx training image 1.13.1
12+
* HuggingFace PT Neuronx release in SDK
13+
14+
## v2.196.0 (2023-10-27)
15+
16+
### Features
17+
18+
* inference instance type conditioned on training instance type
19+
20+
### Bug Fixes and Other Changes
21+
22+
* improved jumpstart tagging
23+
24+
## v2.195.1 (2023-10-26)
25+
26+
### Bug Fixes and Other Changes
27+
28+
* Allow either instance_type or instance_group to be defined in…
29+
* enhance image_uris unit tests
30+
31+
## v2.195.0 (2023-10-25)
32+
33+
### Features
34+
35+
* jumpstart gated model artifacts
36+
* jumpstart extract generated text from response
37+
* jumpstart contruct payload utility
38+
39+
### Bug Fixes and Other Changes
40+
41+
* relax upper bound on urllib in local mode requirements
42+
* bump urllib3 version
43+
* allow smdistributed to be enabled with torch_distributed.
44+
* fix URL links
45+
46+
### Documentation Changes
47+
48+
* remove python 2 reference
49+
* update framework version links
50+
51+
## v2.194.0 (2023-10-19)
52+
53+
### Features
54+
55+
* Added register step in Jumpstart model
56+
* jumpstart instance specific metric definitions
57+
58+
### Bug Fixes and Other Changes
59+
60+
* Updates for DJL 0.24.0 Release
61+
* use getter for resource-metadata dict
62+
* add method to Model class to check if repack is needed
63+
64+
## v2.193.0 (2023-10-18)
65+
66+
### Features
67+
68+
* jumpstart model artifact instance type variants
69+
* jumpstart instance specific hyperparameters
70+
* Feature Processor event based triggers (#1132)
71+
* Support job checkpoint in remote function
72+
* jumpstart model package arn instance type variants
73+
74+
### Bug Fixes and Other Changes
75+
76+
* Fix hyperlinks in feature_processor.scheduler parameter descriptions
77+
* add image_uris_unit_test pytest mark
78+
* bump apache-airflow to `v2.7.2`
79+
* clone distribution in validate_distribution
80+
* fix flaky Inference Recommender integration tests
81+
82+
### Documentation Changes
83+
84+
* Update PipelineModel.register documentation
85+
* specify that input_shape in no longer required for torch 2.0 mod…
86+
387
## v2.192.1 (2023-10-13)
488

589
### Bug Fixes and Other Changes

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
2.192.2.dev0
1+
2.197.1.dev0

doc/amazon_sagemaker_featurestore.rst

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -230,9 +230,11 @@ The following code from the fraud detection example shows a minimal
230230
    enable_online_store=True
231231
)
232232
233-
Creating a feature group takes time as the data is loaded. You will need
234-
to wait until it is created before you can use it. You can check status
235-
using the following method.
233+
Creating a feature group takes time as the data is loaded. You will
234+
need to wait until it is created before you can use it. You can
235+
check status using the following method. Note that it can take
236+
approximately 10-15 minutes to provision an online ``FeatureGroup``
237+
with the ``InMemory`` ``StorageType``.
236238

237239
.. code:: python
238240
@@ -480,7 +482,9 @@ Feature Store `DatasetBuilder API Reference
480482
.. rubric:: Delete a feature group
481483
:name: bCe9CA61b78
482484

483-
You can delete a feature group with the ``delete`` function.
485+
You can delete a feature group with the ``delete`` function. Note that it
486+
can take approximately 10-15 minutes to delete an online ``FeatureGroup``
487+
with the ``InMemory`` ``StorageType``.
484488

485489
.. code:: python
486490

doc/overview.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ After you train a model, you can save it, and then serve the model as an endpoin
3232
Prepare a Training script
3333
=========================
3434

35-
Your training script must be a Python 2.7 or 3.6 compatible source file.
35+
Your training script must be a 3.6 compatible source file.
3636

3737
The training script is very similar to a training script you might run outside of SageMaker, but you can access useful properties about the training environment through various environment variables, including the following:
3838

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
1-
urllib3>=1.26.8,<1.26.15
1+
urllib3>=1.26.8,<3.0.0
22
docker>=5.0.2,<7.0.0
33
PyYAML>=5.4.1,<7

requirements/extras/test_requirements.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ awslogs==0.14.0
1212
black==22.3.0
1313
stopit==1.1.2
1414
# Update tox.ini to have correct version of airflow constraints file
15-
apache-airflow==2.7.1
15+
apache-airflow==2.7.2
1616
apache-airflow-providers-amazon==7.2.1
1717
attrs>=23.1.0,<24
1818
fabric==2.6.0
@@ -24,7 +24,7 @@ pandas>=1.3.5,<1.5
2424
scikit-learn==1.3.0
2525
cloudpickle==2.2.1
2626
scipy==1.10.1
27-
urllib3>=1.26.8,<1.26.15
27+
urllib3>=1.26.8,<3.0.0
2828
docker>=5.0.2,<7.0.0
2929
PyYAML==6.0
3030
pyspark==3.3.1

src/sagemaker/chainer/estimator.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -108,7 +108,7 @@ def __init__(
108108
framework_version (str): Chainer version you want to use for
109109
executing your model training code. Defaults to ``None``. Required unless
110110
``image_uri`` is provided. List of supported versions:
111-
https://github.com/aws/sagemaker-python-sdk#chainer-sagemaker-estimators.
111+
https://sagemaker.readthedocs.io/en/stable/frameworks/chainer/using_chainer.html#using-chainer-with-the-sagemaker-python-sdk.
112112
image_uri (str): If specified, the estimator will use this image
113113
for training and hosting, instead of selecting the appropriate
114114
SageMaker official image based on framework_version and

src/sagemaker/djl_inference/model.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -781,7 +781,7 @@ def serving_image_uri(self, region_name):
781781
str: The appropriate image URI based on the given parameters.
782782
"""
783783
if not self.djl_version:
784-
self.djl_version = "0.23.0"
784+
self.djl_version = "0.24.0"
785785

786786
return image_uris.retrieve(
787787
self._framework(),

src/sagemaker/estimator.py

Lines changed: 48 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -71,7 +71,7 @@
7171
from sagemaker.utils import instance_supports_kms
7272
from sagemaker.job import _Job
7373
from sagemaker.jumpstart.utils import (
74-
add_jumpstart_tags,
74+
add_jumpstart_uri_tags,
7575
get_jumpstart_base_name_if_jumpstart_model,
7676
update_inference_tags_with_jumpstart_training_tags,
7777
)
@@ -101,6 +101,7 @@
101101
)
102102
from sagemaker.workflow import is_pipeline_variable
103103
from sagemaker.workflow.entities import PipelineVariable
104+
from sagemaker.workflow.parameters import ParameterString
104105
from sagemaker.workflow.pipeline_context import PipelineSession, runnable_by_pipeline
105106

106107
logger = logging.getLogger(__name__)
@@ -576,9 +577,7 @@ def __init__(
576577
self.entry_point = entry_point
577578
self.dependencies = dependencies or []
578579
self.uploaded_code: Optional[UploadedCode] = None
579-
self.tags = add_jumpstart_tags(
580-
tags=tags, training_model_uri=self.model_uri, training_script_uri=self.source_dir
581-
)
580+
582581
if self.instance_type in ("local", "local_gpu"):
583582
if self.instance_type == "local_gpu" and self.instance_count > 1:
584583
raise RuntimeError("Distributed Training in Local GPU is not supported")
@@ -591,6 +590,15 @@ def __init__(
591590
else:
592591
self.sagemaker_session = sagemaker_session or Session()
593592

593+
self.tags = (
594+
add_jumpstart_uri_tags(
595+
tags=tags, training_model_uri=self.model_uri, training_script_uri=self.source_dir
596+
)
597+
if getattr(self.sagemaker_session, "settings", None) is not None
598+
and self.sagemaker_session.settings.include_jumpstart_tags
599+
else tags
600+
)
601+
594602
self.base_job_name = base_job_name
595603
self._current_job_name = None
596604
if (
@@ -3198,6 +3206,7 @@ class Framework(EstimatorBase):
31983206
"""
31993207

32003208
_framework_name = None
3209+
UNSUPPORTED_DLC_IMAGE_FOR_SM_PARALLELISM = ("2.0.1-gpu-py310-cu121", "2.0-gpu-py310-cu121")
32013210

32023211
def __init__(
32033212
self,
@@ -3816,6 +3825,7 @@ def _distribution_configuration(self, distribution):
38163825

38173826
mpi_enabled = False
38183827
smdataparallel_enabled = False
3828+
p5_enabled = False
38193829
if "instance_groups" in distribution:
38203830
distribution_config["sagemaker_distribution_instance_groups"] = distribution[
38213831
"instance_groups"
@@ -3843,16 +3853,44 @@ def _distribution_configuration(self, distribution):
38433853
"custom_mpi_options", ""
38443854
)
38453855

3846-
if get_mp_parameters(distribution):
3847-
distribution_config["mp_parameters"] = get_mp_parameters(distribution)
3848-
3849-
elif "modelparallel" in distribution.get("smdistributed", {}):
3850-
raise ValueError("Cannot use Model Parallelism without MPI enabled!")
3851-
38523856
if "smdistributed" in distribution:
38533857
# smdistributed strategy selected
3858+
if get_mp_parameters(distribution):
3859+
distribution_config["mp_parameters"] = get_mp_parameters(distribution)
3860+
# first make sure torch_distributed is enabled if instance type is p5
3861+
torch_distributed_enabled = False
3862+
if "torch_distributed" in distribution:
3863+
torch_distributed_enabled = distribution.get("torch_distributed").get(
3864+
"enabled", False
3865+
)
38543866
smdistributed = distribution["smdistributed"]
38553867
smdataparallel_enabled = smdistributed.get("dataparallel", {}).get("enabled", False)
3868+
if isinstance(self.instance_type, ParameterString):
3869+
p5_enabled = "p5.48xlarge" in self.instance_type.default_value
3870+
elif isinstance(self.instance_type, str):
3871+
p5_enabled = "p5.48xlarge" in self.instance_type
3872+
else:
3873+
for instance in self.instance_groups:
3874+
if "p5.48xlarge" in instance._to_request_dict().get("InstanceType", ()):
3875+
p5_enabled = True
3876+
break
3877+
3878+
img_uri = "" if self.image_uri is None else self.image_uri
3879+
for unsupported_image in Framework.UNSUPPORTED_DLC_IMAGE_FOR_SM_PARALLELISM:
3880+
if (
3881+
unsupported_image in img_uri and not torch_distributed_enabled
3882+
): # disabling DLC images with CUDA12
3883+
raise ValueError(
3884+
f"SMDistributed is currently incompatible with DLC image: {img_uri}. "
3885+
"(Could be due to CUDA version being greater than 11.)"
3886+
)
3887+
if (
3888+
not torch_distributed_enabled and p5_enabled
3889+
): # disabling p5 when torch distributed is disabled
3890+
raise ValueError(
3891+
"SMModelParallel and SMDataParallel currently do not support p5 instances."
3892+
)
3893+
# smdistributed strategy selected with supported instance type
38563894
distribution_config[self.LAUNCH_SM_DDP_ENV_NAME] = smdataparallel_enabled
38573895
distribution_config[self.INSTANCE_TYPE] = self.instance_type
38583896
if smdataparallel_enabled:

src/sagemaker/feature_store/feature_processor/__init__.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,8 +30,16 @@
3030
to_pipeline,
3131
schedule,
3232
describe,
33+
put_trigger,
34+
delete_trigger,
35+
enable_trigger,
36+
disable_trigger,
3337
delete_schedule,
3438
list_pipelines,
3539
execute,
3640
TransformationCode,
41+
FeatureProcessorPipelineEvents,
42+
)
43+
from sagemaker.feature_store.feature_processor._enums import ( # noqa: F401
44+
FeatureProcessorPipelineExecutionStatus,
3745
)

0 commit comments

Comments
 (0)