Skip to content

Commit 9542b9e

Browse files
committed
florent's review
1 parent 29e1c3a commit 9542b9e

File tree

8 files changed

+69
-22
lines changed

8 files changed

+69
-22
lines changed

docs/sagemaker/source/_toctree.yml

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,11 @@
11
- sections:
22
- local: index
33
title: Hugging Face on AWS
4-
- local: resources
5-
title: Other Resources
64
title: Get Started
75
isExpanded: true
86
- sections:
97
- local: dlcs/introduction
108
title: Introduction
11-
- local: dlcs/features
12-
title: Features & benefits
139
- local: dlcs/available
1410
title: Available DLCs on AWS
1511
title: Deep Learning Containers (DLCs)
@@ -62,5 +58,7 @@
6258
- sections:
6359
- local: reference/inference-toolkit
6460
title: Inference Toolkit API
61+
- local: reference/resources
62+
title: Other Resources
6563
title: Reference
6664
isExpanded: false

docs/sagemaker/source/dlcs/available.md

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,8 @@ For each supported combination of use-case (training, inference), accelerator ty
88

99
Pytorch Training DLC: For training, our DLCs are available for PyTorch via Transformers. They include support for training on GPUs and AWS AI chips with libraries such as TRL, Sentence Transformers, or Diffusers.
1010

11+
You can also keep track of the latest Pytorch Training DLC releases [here](https://github.com/aws/deep-learning-containers/releases?q=huggingface-training+AND+NOT+neuronx&expanded=true).
12+
1113
| Container URI | Accelerator |
1214
| -------------------------------------------------------------------------------------------------------------------------------- | ----------- |
1315
| 763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-training:2.5.1-transformers4.49.0-gpu-py311-cu124-ubuntu22.04 | GPU |
@@ -19,6 +21,8 @@ Pytorch Training DLC: For training, our DLCs are available for PyTorch via Trans
1921

2022
For inference, we have a general-purpose PyTorch inference DLC, for serving models trained with any of those frameworks mentioned before on CPU, GPU, and AWS AI chips.
2123

24+
You can also keep track of the latest Pytorch Inference DLC releases [here](https://github.com/aws/deep-learning-containers/releases?q=huggingface-inference+AND+NOT+tgi+AND+NOT+neuronx&expanded=true).
25+
2226
| Container URI | Accelerator |
2327
| -------------------------------------------------------------------------------------------------------------------------------- | ----------- |
2428
| 763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-inference:2.6.0-transformers4.49.0-cpu-py312-ubuntu22.04- | CPU |
@@ -29,6 +33,8 @@ For inference, we have a general-purpose PyTorch inference DLC, for serving mode
2933

3034
There is also the LLM Text Generation Inference (TGI) DLC for high-performance text generation of LLMs on GPU and AWS AI chips.
3135

36+
You can also keep track of the latest LLM TGI DLC releases [here](https://github.com/aws/deep-learning-containers/releases?q=tgi+AND+gpu&expanded=true).
37+
3238
| Container URI | Accelerator |
3339
| -------------------------------------------------------------------------------------------------------------------------------- | ----------- |
3440
| 763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-tgi-inference:2.6.0-tgi3.2.3-gpu-py311-cu124-ubuntu22.04 | GPU |
@@ -45,11 +51,11 @@ Finally, there is a Text Embeddings Inference (TEI) DLC for high-performance ser
4551

4652
## FAQ
4753

48-
**How to choose the right container for my use case?**
54+
**How to choose the right inference container for my use case?**
4955

50-
![dlc-decision-tree](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/sagemaker/dlc-decision-tree.png)
56+
![inference-dlc-decision-tree](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/sagemaker/inference-dlc-decision-tree.png)
5157

52-
*Note:* See [here]((https://huggingface.co/docs/sagemaker/main/en/reference/inference-toolkit)) for the list of supported task in the inference toolkit.
58+
*Note:* See [here](https://huggingface.co/docs/sagemaker/main/en/reference/inference-toolkit) for the list of supported task in the inference toolkit.
5359

5460
*Note:* Browse through the Hub to see if you model is tagged ["text-generation-inference"](https://huggingface.co/models?other=text-generation-inference) or ["text-embeddings-inference"](https://huggingface.co/models?other=text-embeddings-inference)
5561

docs/sagemaker/source/dlcs/features.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,11 +4,11 @@ The Hugging Face DLCs provide ready-to-use, tested environments to train and dep
44

55
## One command is all you need
66

7-
With the new Hugging Face DLCs, train and deploy cutting-edge Transformers-based NLP models in a single line of code. The Hugging Face PyTorch DLCs for training come with all the libraries installed to run a single command e.g. via TRL CLI to fine-tune LLMs on any setting, either single-GPU, single-node multi-GPU, and more.
7+
With the new Hugging Face DLCs, train and deploy cutting-edge Transformers-based NLP models in a single line of code. The Hugging Face PyTorch DLCs for training come with all the libraries installed to run a single command e.g. via [TRL CLI](https://huggingface.co/docs/trl/en/clis) to fine-tune LLMs on any setting, either single-GPU, single-node multi-GPU, and more.
88

99
## Accelerate machine learning from science to production
1010

11-
In addition to Hugging Face DLCs, we created a first-class Hugging Face library for inference, huggingface-inference-toolkit, that comes with the Hugging Face PyTorch DLCs for inference, with full support on serving any PyTorch model on AWS.
11+
In addition to Hugging Face DLCs, we created a first-class Hugging Face library for inference, [`sagemaker-huggingface-inference-toolkit`](https://github.com/aws/sagemaker-huggingface-inference-toolkit/tree/main/src/sagemaker_huggingface_inference_toolkit), that comes with the Hugging Face PyTorch DLCs for inference, with full support on serving any PyTorch model on AWS.
1212

1313
Deploy your trained models for inference with just one more line of code or select any of the ever growing publicly available models from the model Hub.
1414

@@ -26,4 +26,4 @@ Additionally, these DLCs come with full support for AWS meaning that deploying m
2626

2727
Hugging Face DLCs feature built-in performance optimizations for PyTorch to train models faster. The DLCs also give you the flexibility to choose a training infrastructure that best aligns with the price/performance ratio for your workload.
2828

29-
Hugging Face Inference DLCs provide you with production-ready endpoints that scale quickly with your Google Cloud environment, built-in monitoring, and a ton of enterprise features.
29+
Hugging Face Inference DLCs provide you with production-ready endpoints that scale quickly with your AWS environment, built-in monitoring, and a ton of enterprise features.

docs/sagemaker/source/dlcs/introduction.md

Lines changed: 36 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -3,10 +3,40 @@
33
Hugging Face built Deep Learning Containers (DLCs) for Amazon Web Services customers to run any of their machine learning workload in an optimized environment, with no configuration or maintenance on their part. These are Docker images pre-installed with deep learning frameworks and libraries such as 🤗 Transformers, 🤗 Datasets, and 🤗 Tokenizers. The DLCs allow you to directly serve and train any models, skipping the complicated process of building and optimizing your serving and training environments from scratch.
44

55
The containers are publicly maintained, updated and released periodically by Hugging Face and the AWS team and available for all AWS customers within the AWS’s Elastic Container Registry. They can be used from any AWS service such as:
6-
* Amazon Sagemaker AI: Amazon SageMaker AI is a fully managed machine learning (ML) platform for data scientists and developers to quickly and confidently build, train, and deploy ML models into a production-ready hosted environment.
7-
* Amazon Bedrock: Amazon Bedrock is a fully managed service that makes high-performing foundation models (FMs) from leading AI companies and Amazon available for your use through a unified API to build generative AI applications.
8-
* Amazon Elastic Kubernetes Service (EKS): Amazon EKS is the premiere platform for running Kubernetes clusters in the AWS cloud.
9-
* Amazon Elastic Container Service (ECS): Amazon ECS is a fully managed container orchestration service that helps you easily deploy, manage, and scale containerized applications.
10-
* Amazon Elastic Compute Cloud (EC2): Amazon EC2 provides on-demand, scalable computing capacity in the Amazon Web Services (AWS) Cloud.
6+
* **Amazon Sagemaker AI**: Amazon SageMaker AI is a fully managed machine learning (ML) platform for data scientists and developers to quickly and confidently build, train, and deploy ML models into a production-ready hosted environment.
7+
* **Amazon Bedrock**: Amazon Bedrock is a fully managed service that makes high-performing foundation models (FMs) from leading AI companies and Amazon available for your use through a unified API to build generative AI applications.
8+
* **Amazon Elastic Kubernetes Service (EKS)**: Amazon EKS is the premiere platform for running Kubernetes clusters in the AWS cloud.
9+
* **Amazon Elastic Container Service (ECS)**: Amazon ECS is a fully managed container orchestration service that helps you easily deploy, manage, and scale containerized applications.
10+
* **Amazon Elastic Compute Cloud (EC2)**: Amazon EC2 provides on-demand, scalable computing capacity in the Amazon Web Services (AWS) Cloud.
1111

12-
Hugging Face DLCs are open source and licensed under Apache 2.0. Feel free to reach out on our [community forum](https://discuss.huggingface.co/c/sagemaker/17) if you have any questions.
12+
Hugging Face DLCs are open source and licensed under Apache 2.0. Feel free to reach out on our [community forum](https://discuss.huggingface.co/c/sagemaker/17) if you have any questions.
13+
14+
## Features & benefits
15+
16+
The Hugging Face DLCs provide ready-to-use, tested environments to train and deploy Hugging Face models.
17+
18+
### One command is all you need
19+
20+
With the new Hugging Face DLCs, train and deploy cutting-edge Transformers-based NLP models in a single line of code. The Hugging Face PyTorch DLCs for training come with all the libraries installed to run a single command e.g. via [TRL CLI](https://huggingface.co/docs/trl/en/clis) to fine-tune LLMs on any setting, either single-GPU, single-node multi-GPU, and more.
21+
22+
### Accelerate machine learning from science to production
23+
24+
In addition to Hugging Face DLCs, we created a first-class Hugging Face library for inference, [`sagemaker-huggingface-inference-toolkit`](https://github.com/aws/sagemaker-huggingface-inference-toolkit/tree/main/src/sagemaker_huggingface_inference_toolkit), that comes with the Hugging Face PyTorch DLCs for inference, with full support on serving any PyTorch model on AWS.
25+
26+
Deploy your trained models for inference with just one more line of code or select any of the ever growing publicly available models from the model Hub.
27+
28+
### High-performance text generation and embedding
29+
30+
Besides the PyTorch-oriented DLCs, Hugging Face also provides high-performance inference for both text generation and embedding models via the Hugging Face DLCs for both Text Generation Inference (TGI) and Text Embeddings Inference (TEI), respectively.
31+
32+
The Hugging Face DLC for TGI enables you to deploy any of the +225,000 text generation inference supported models from the Hugging Face Hub, or any custom model as long as its architecture is supported within TGI.
33+
34+
The Hugging Face DLC for TEI enables you to deploy any of the +12,000 embedding, re-ranking or sequence classification supported models from the Hugging Face Hub, or any custom model as long as its architecture is supported within TEI.
35+
36+
Additionally, these DLCs come with full support for AWS meaning that deploying models from Amazon Simple Storage Service (S3) is also straight forward and requires no configuration.
37+
38+
### Built-in performance
39+
40+
Hugging Face DLCs feature built-in performance optimizations for PyTorch to train models faster. The DLCs also give you the flexibility to choose a training infrastructure that best aligns with the price/performance ratio for your workload.
41+
42+
Hugging Face Inference DLCs provide you with production-ready endpoints that scale quickly with your AWS environment, built-in monitoring, and a ton of enterprise features.

docs/sagemaker/source/reference/inference-toolkit.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
## Supported tasks
44

5-
The Inference Toolkit accepts inputs in the `inputs` key, and supports additional [`pipelines`](https://huggingface.co/docs/transformers/main_classes/pipelines) parameters in the `parameters` key. You can provide any of the supported `kwargs` from `pipelines` as `parameters`.
5+
The [Sagemaker Hugging Face Inference Toolkit](https://github.com/aws/sagemaker-huggingface-inference-toolkit/tree/main) accepts inputs in the `inputs` key, and supports additional [`pipelines`](https://huggingface.co/docs/transformers/main_classes/pipelines) parameters in the `parameters` key. You can provide any of the supported `kwargs` from `pipelines` as `parameters`.
66

77
Tasks supported by the Inference Toolkit API include:
88

File renamed without changes.

docs/sagemaker/source/tutorials/bedrock/bedrock-quickstart.md

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,14 @@
11
# Quickstart — Using Hugging Face Models with Amazon Bedrock Marketplace
22

33
## Why use Bedrock Marketplace for Hugging Face models?
4-
Amazon Bedrock now exposes 83 Hugging Face open-weight models—including Gemma, Llama 3, Mistral, and more—through a single catalog. You invoke them with the same Bedrock APIs you already use for Titan, Anthropic, Cohere, etc. Under the hood, Bedrock Marketplace model endpoints are managed by Amazon SageMaker AI. With Bedrock Marketplace, you can now combine the ease of use of SageMaker JumpStart with the fully managed infrastructure of Amazon Bedrock, including compatibility with high-level APIs such as Agents, Knowledge Bases, Guardrails and Model Evaluations.
4+
Amazon Bedrock now exposes Hugging Face open-weight models—including Gemma, Llama 3, Mistral, and more—through a single catalog. You invoke them with the same Bedrock APIs you already use for Titan, Anthropic, Cohere, etc. Under the hood, Bedrock Marketplace model endpoints are managed by Amazon SageMaker AI. With Bedrock Marketplace, you can now combine the ease of use of SageMaker JumpStart with the fully managed infrastructure of Amazon Bedrock, including compatibility with high-level APIs such as Agents, Knowledge Bases, Guardrails and Model Evaluations.
55

66
## 1 . Prerequisites
77

88
|  | Requirement | Notes |
99
|---|-------------|
10-
| AWS account in a Bedrock Region | Marketplace is regional; switch the console to one of the 14 supported Regions first. |
11-
| Permissions | For a quick trial, attach AmazonBedrockFullAccess and AmazonSageMakerFullAccess.|
10+
| AWS account in a Bedrock Region | Marketplace is regional; switch the console to one of the 14 supported Regions first, for example `us-east-1`. |
11+
| Permissions | For a quick trial, attach `AmazonBedrockFullAccess` and `AmazonSageMakerFullAccess`.|
1212
| Service quotas | The SageMaker endpoint uses GPU instances (for example ml.g5). Verify you have quota or request it. |
1313
| JumpStart-only | If you choose path B, create a SageMaker Studio domain and user profile first (Console ▸ SageMaker ▸ Domains). Open Studio before continuing. |
1414

@@ -24,7 +24,11 @@ Path A is from the Bedrock *Model Catalog*:
2424
3. If you see Subscribe, review pricing & terms, click Subscribe, then continue
2525
4. Click Deploy → name the endpoint → keep the recommended instance → accept the EULA → Deploy
2626
5. Wait for Foundation Models → Marketplace deployments to show status In service (takes a few minutes)
27-
6. Click the deployment name and copy the SageMaker endpoint ARN — you’ll need it for API calls
27+
6. Click the deployment name and copy the SageMaker endpoint ARN — you’ll need it for API calls
28+
29+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/sagemaker/bedrock-marketplace-deployment.gif"
30+
alt="Bedrock deployment demo"
31+
width="500">
2832

2933
Path B is from SageMaker JumpStart for the model that shows “Use with Bedrock”:
3034
1. In SageMaker Studio, open JumpStart

docs/sagemaker/source/tutorials/jumpstart/jumpstart-quickstart.md

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ In this quickstart guide, we will deploy [Qwen/Qwen2.5-14B-Instruct](https://hug
1313
| AWS account with SageMaker enabled | An AWS account that will contain all your AWS resources. |
1414
| An IAM role to access SageMaker AI | Learn more about how IAM works with SageMaker AI in this [guide](https://docs.aws.amazon.com/sagemaker/latest/dg/security-iam.html). |
1515
| SageMaker Studio domain and user profile | We recommend using SageMaker Studio for straightforward deployment and inference. Follow this [guide](https://docs.aws.amazon.com/sagemaker/latest/dg/onboard-quick-start.html). |
16-
| Service quotas | Most LLMs need GPU instances (e.g. ml.g5). Verify you have quota for ml.g5.24xlarge or [request it](https://docs.aws.amazon.com/sagemaker/latest/dg/canvas-requesting-quota-increases.html). |
16+
| Service quotas | Most LLMs need GPU instances (e.g. ml.g5). Verify you have quota for `ml.g5.24xlarge` or [request it](https://docs.aws.amazon.com/sagemaker/latest/dg/canvas-requesting-quota-increases.html). |
1717

1818
## 2· Endpoint deployment
1919

@@ -24,10 +24,19 @@ Let's explain how you would deploy a Hugging Face model to SageMaker browsing th
2424
4. Wait until Endpoints shows In service.
2525
5. Copy the Endpoint name (or ARN) for later use.
2626

27+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/sagemaker/jumpstart-deployment.gif"
28+
alt="JumpStart deployment demo"
29+
width="500">
30+
2731
Alternatively, you can also browse through the Hugging Face Model Hub:
2832
1. Open the model page → Click Deploy → SageMaker → Jumpstart tab if model is available.
2933
2. Copy the code snippet and use it from a SageMaker Notebook instance.
3034

35+
36+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/sagemaker/hf-jumpstart-deployment.gif"
37+
alt="JumpStart deployment demo"
38+
width="500">
39+
3140
```python
3241
# SageMaker JumpStart provides APIs as part of SageMaker SDK that allow you to deploy and fine-tune models in network isolation using scripts that SageMaker maintains.
3342

0 commit comments

Comments
 (0)