@@ -1226,6 +1226,28 @@ to configure or manage the underlying infrastructure. After you trained a model,
12261226Serverless endpoint and then invoke the endpoint with the model to get inference results back. More information about
12271227SageMaker Serverless Inference can be found in the `AWS documentation <https://docs.aws.amazon.com/sagemaker/latest/dg/serverless-endpoints.html >`__.
12281228
1229+ For using SageMaker Serverless Inference, if you plan to use any of the SageMaker-provided container or Bring Your Own Container
1230+ model, you will need to pass ``image_uri ``. An example to use ``image_uri `` for creating MXNet model:
1231+
1232+ .. code :: python
1233+
1234+ from sagemaker.mxnet import MXNetModel
1235+ import sagemaker
1236+
1237+ role = sagemaker.get_execution_role()
1238+
1239+ # create MXNet Model Class
1240+ mxnet_model = MXNetModel(
1241+ model_data = " s3://my_bucket/pretrained_model/model.tar.gz" , # path to your trained sagemaker model
1242+ role = role, # iam role with permissions to create an Endpoint
1243+ entry_point = " inference.py" ,
1244+ image_uri = " 763104351884.dkr.ecr.us-west-2.amazonaws.com/mxnet-inference:1.4.1-cpu-py3" # image wanted to use
1245+ )
1246+
1247+ For more Amazon SageMaker provided algorithms and containers image paths, please check this page: `Amazon SageMaker provided
1248+ algorithms and Deep Learning Containers <https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-algo-docker-registry-paths.html> `_.
1249+ After creating model using ``image_uri ``, you can then follow the steps below to create serverless endpoint.
1250+
12291251To deploy serverless endpoint, you will need to create a ``ServerlessInferenceConfig ``.
12301252If you create ``ServerlessInferenceConfig `` without specifying its arguments, the default ``MemorySizeInMB `` will be **2048 ** and
12311253the default ``MaxConcurrency `` will be **5 ** :
@@ -1254,6 +1276,14 @@ Then use the ``ServerlessInferenceConfig`` in the estimator's ``deploy()`` metho
12541276 # Deploys the model that was generated by fit() to a SageMaker serverless endpoint
12551277 serverless_predictor = estimator.deploy(serverless_inference_config = serverless_config)
12561278
1279+ Or directly using model's ``deploy() `` method to deploy a serverless endpoint:
1280+
1281+ .. code :: python
1282+
1283+ # Deploys the model to a SageMaker serverless endpoint
1284+ serverless_predictor = model.deploy(serverless_inference_config = serverless_config)
1285+
1286+
12571287 After deployment is complete, you can use predictor's ``predict() `` method to invoke the serverless endpoint just like
12581288real-time endpoints:
12591289
0 commit comments