@@ -684,6 +684,63 @@ For more detailed explanations of the classes that this library provides for aut
684684- `API docs for HyperparameterTuner and parameter range classes <https://sagemaker.readthedocs.io/en/stable/tuner.html >`__
685685- `API docs for analytics classes <https://sagemaker.readthedocs.io/en/stable/analytics.html >`__
686686
687+ *******************************
688+ SageMaker Serverless Inference
689+ *******************************
690+ Amazon SageMaker Serverless Inference enables you to easily deploy machine learning models for inference without having
691+ to configure or manage the underlying infrastructure. After you trained a model, you can deploy it to Amazon Sagemaker
692+ Serverless endpoint and then invoke the endpoint with the model to get inference results back. More information about
693+ SageMaker Serverless Inference can be found in the `AWS documentation <https://docs.aws.amazon.com/sagemaker/latest/dg/serverless-endpoints.html >`__.
694+
695+ To deploy serverless endpoint, you will need to create a ``ServerlessInferenceConfig ``.
696+ If you create ``ServerlessInferenceConfig `` without specifying its arguments, the default ``MemorySizeInMB `` will be **2048 ** and
697+ the default ``MaxConcurrency `` will be **5 ** :
698+
699+ .. code :: python
700+
701+ from sagemaker.serverless import ServerlessInferenceConfig
702+
703+ # Create an empty ServerlessInferenceConfig object to use default values
704+ serverless_config = new ServerlessInferenceConfig()
705+
706+ Or you can specify ``MemorySizeInMB `` and ``MaxConcurrency `` in ``ServerlessInferenceConfig `` (example shown below):
707+
708+ .. code :: python
709+
710+ # Specify MemorySizeInMB and MaxConcurrency in the serverless config object
711+ serverless_config = new ServerlessInferenceConfig(
712+ memory_size_in_mb = 4096 ,
713+ max_concurrency = 10 ,
714+ )
715+
716+ Then use the ``ServerlessInferenceConfig `` in the estimator's ``deploy() `` method to deploy a serverless endpoint:
717+
718+ .. code :: python
719+
720+ # Deploys the model that was generated by fit() to a SageMaker serverless endpoint
721+ serverless_predictor = estimator.deploy(serverless_inference_config = serverless_config)
722+
723+ After deployment is complete, you can use predictor's ``predict() `` method to invoke the serverless endpoint just like
724+ real-time endpoints:
725+
726+ .. code :: python
727+
728+ # Serializes data and makes a prediction request to the SageMaker serverless endpoint
729+ response = serverless_predictor.predict(data)
730+
731+ Clean up the endpoint and model if needed after inference:
732+
733+ .. code :: python
734+
735+ # Tears down the SageMaker endpoint and endpoint configuration
736+ serverless_predictor.delete_endpoint()
737+
738+ # Deletes the SageMaker model
739+ serverless_predictor.delete_model()
740+
741+ For more details about ``ServerlessInferenceConfig ``,
742+ see the API docs for `Serverless Inference <https://sagemaker.readthedocs.io/en/stable/api/inference/serverless.html >`__
743+
687744*************************
688745SageMaker Batch Transform
689746*************************
0 commit comments