initial PR

pagezyhf · pagezyhf · commit 8903a8ae788e · 2025-04-22T16:51:29.000+02:00
diff --git a/docs/sagemaker/_toctree.yml b/docs/sagemaker/_toctree.yml
@@ -1,10 +1,28 @@
-- local: index
-  title: Hugging Face on Amazon SageMaker
-- local: getting-started
-  title: Get started
-- local: train
-  title: Run training on Amazon SageMaker
-- local: inference
-  title: Deploy models to Amazon SageMaker
-- local: reference
+- sections:
+    - local: getting-started/index
+      title: Hugging Face on AWS
+    - local: getting-started/deploy
+      title: Deploy Models on AWS
+    - local: getting-started/train
+      title: Train Models on AWS
+    - local: getting-started/resources
+      title: Other Resources
+  title: Getting Started
+- sections:
+    - local: dlcs/introduction
+      title: Introduction
+    - local: dlcs/features
+      title: Features & benefits
+    - local: dlcs/available
+      title: Available DLCs on AWS
+  title: Deep Learning Containers (DLCs)
+- sections:
+  title: Examples
+- sections:
+  title: Advanced Topics
+- sections:
+  title: How-to
+- sections:
+    - local: reference/inference-toolkit
+      title: Inference Toolkit API
   title: Reference
diff --git a/docs/sagemaker/dlcs/available.md b/docs/sagemaker/dlcs/available.md
@@ -0,0 +1,55 @@
+# Available DLCs on AWS
+
+Below you can find a listing of all the Deep Learning Containers (DLCs) available on AWS.
+
+For each supported combination of use-case (training, inference), accelerator type (CPU, GPU, Neuron), and framework (PyTorch, TGI, TEI) containers are created.
+
+## FAQ
+
+**How to choose the right container for my use case?**
+
+**How to find the URI of my container?**
+The URI is built with an AWS account ID and an AWS region. Those two values need to be replaced depending on your use case.
+Let's say you want to use the training DLC for GPUs in  
+- `dlc-aws-account-id`: The AWS account ID of the account that owns the ECR repository. You can find them in the [here](https://github.com/aws/sagemaker-python-sdk/blob/e0b9d38e1e3b48647a02af23c4be54980e53dc61/src/sagemaker/image_uri_config/huggingface.json#L21)
+- `region`: The AWS region where you want to use it.
+
+## Training
+
+Pytorch Training DLC: For training, our DLCs are available for PyTorch via :hugging_face: Transformers. They include support for training on GPUs and AWS AI chips with libraries such as :hugging_face: TRL, Sentence Transformers, or :firecracker: Diffusers.
+
+| Container URI                                                                                                                    | Accelerator |
+| -------------------------------------------------------------------------------------------------------------------------------- | ----------- |
+| 763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-training:2.5.1-transformers4.49.0-gpu-py311-cu124-ubuntu22.04 | GPU         |
+| 763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-training-neuronx:2.1.2-transformers4.48.1-neuronx-py310-sdk2.20.0-ubuntu20.04 | Neuron         |
+
+
+## Inference
+
+### Pytorch Inference DLC
+
+For inference, we have a general-purpose PyTorch inference DLC, for serving models trained with any of those frameworks mentioned before on CPU, GPU, and AWS AI chips.
+
+| Container URI                                                                                                                    | Accelerator |
+| -------------------------------------------------------------------------------------------------------------------------------- | ----------- |
+| 763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-inference:2.6.0-transformers4.49.0-cpu-py312-ubuntu22.04- | CPU         |
+| 763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-inference:2.6.0-transformers4.49.0-gpu-py312-cu124-ubuntu22.04 | GPU         |
+| 763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-inference-neuronx:2.1.2-transformers4.43.2-neuronx-py310-sdk2.20.0-ubuntu20.04 | Neuron         |
+
+### Text Generation Inference
+
+There is also the Text Generation Inference (TGI) DLC for high-performance text generation of LLMs on GPU and AWS AI chips.
+
+| Container URI                                                                                                                    | Accelerator |
+| -------------------------------------------------------------------------------------------------------------------------------- | ----------- |
+| 763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-tgi-inference:2.6.0-tgi3.2.3-gpu-py311-cu124-ubuntu22.04 | GPU         |
+| 763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-tgi-inference:2.1.2-optimum0.0.28-neuronx-py310-ubuntu22.04 | Neuron         |
+
+### Text Embedding Inference
+
+Finally, there is a Text Embeddings Inference (TEI) DLC for high-performance serving of embedding models on CPU and GPU.
+
+| Container URI                                                                                                                    | Accelerator |
+| -------------------------------------------------------------------------------------------------------------------------------- | ----------- |
+| 683313688378.dkr.ecr.us-east-1.amazonaws.com/tei-cpu:2.0.1-tei1.2.3-cpu-py310-ubuntu22.04 | CPU         |
+| 683313688378.dkr.ecr.us-east-1.amazonaws.com/tei:2.0.1-tei1.4.0-gpu-py310-cu122-ubuntu22.04 | GPU         |
diff --git a/docs/sagemaker/dlcs/features.md b/docs/sagemaker/dlcs/features.md
@@ -0,0 +1,29 @@
+# Features & benefits
+
+The Hugging Face DLCs provide ready-to-use, tested environments to train and deploy Hugging Face models.
+
+## One command is all you need
+
+With the new Hugging Face DLCs, train and deploy cutting-edge Transformers-based NLP models in a single line of code. The Hugging Face PyTorch DLCs for training come with all the libraries installed to run a single command e.g. via TRL CLI to fine-tune LLMs on any setting, either single-GPU, single-node multi-GPU, and more.
+
+## Accelerate machine learning from science to production
+
+In addition to Hugging Face DLCs, we created a first-class Hugging Face library for inference, huggingface-inference-toolkit, that comes with the Hugging Face PyTorch DLCs for inference, with full support on serving any PyTorch model on AWS.
+
+Deploy your trained models for inference with just one more line of code or select any of the ever growing publicly available models from the model Hub.
+
+## High-performance text generation and embedding
+
+Besides the PyTorch-oriented DLCs, Hugging Face also provides high-performance inference for both text generation and embedding models via the Hugging Face DLCs for both Text Generation Inference (TGI) and Text Embeddings Inference (TEI), respectively.
+
+The Hugging Face DLC for TGI enables you to deploy any of the +225,000 text generation inference supported models from the Hugging Face Hub, or any custom model as long as its architecture is supported within TGI.
+
+The Hugging Face DLC for TEI enables you to deploy any of the +12,000 embedding, re-ranking or sequence classification supported models from the Hugging Face Hub, or any custom model as long as its architecture is supported within TEI.
+
+Additionally, these DLCs come with full support for AWS meaning that deploying models from Amazon Simple Storage Service (S3) is also straight forward and requires no configuration.
+
+## Built-in performance
+
+Hugging Face DLCs feature built-in performance optimizations for PyTorch to train models faster. The DLCs also give you the flexibility to choose a training infrastructure that best aligns with the price/performance ratio for your workload.
+
+Hugging Face Inference DLCs provide you with production-ready endpoints that scale quickly with your Google Cloud environment, built-in monitoring, and a ton of enterprise features.
diff --git a/docs/sagemaker/dlcs/introduction.md b/docs/sagemaker/dlcs/introduction.md
@@ -0,0 +1,10 @@
+# Introduction
+
+Hugging Face built Deep Learning Containers (DLCs) for Amazon Web Services customers to run any of their machine learning workload in an optimized environment, with no configuration or maintenance on their part. These are Docker images pre-installed with deep learning frameworks and libraries such as 🤗 Transformers, 🤗 Datasets, and 🤗 Tokenizers. The DLCs allow you to directly serve and train any models, skipping the complicated process of building and optimizing your serving and training environments from scratch.
+
+The containers are publicly maintained, updated and released periodically by Hugging Face and the AWS team and available for all AWS customers within the AWS’s Elastic Container Registry. They can be used from any AWS service such as:
+* Amazon Sagemaker AI: Amazon SageMaker AI is a fully managed machine learning (ML) platform for data scientists and developers to quickly and confidently build, train, and deploy ML models into a production-ready hosted environment.
+* Amazon Bedrock: Amazon Bedrock is a fully managed service that makes high-performing foundation models (FMs) from leading AI companies and Amazon available for your use through a unified API to build generative AI applications.
+* Amazon Elastic Kubernetes Service (EKS): Amazon EKS is the premiere platform for running Kubernetes clusters in the AWS cloud.
+* Amazon Elastic Container Service (ECS): Amazon ECS is a fully managed container orchestration service that helps you easily deploy, manage, and scale containerized applications.
+* Amazon Elastic Compute Cloud (EC2): Amazon EC2 provides on-demand, scalable computing capacity in the Amazon Web Services (AWS) Cloud.
diff --git a/docs/sagemaker/getting-started/deploy.md b/docs/sagemaker/getting-started/deploy.md
@@ -0,0 +1,37 @@
+# Deploy models on AWS
+
+Deploying Hugging Face models on AWS is streamlined through various services, each suited for different deployment scenarios. Here's how you can deploy your models using AWS and Hugging Face offerings.
+
+## With Sagemaker SDK
+
+Amazon SageMaker is a fully managed AWS service for building, training, and deploying machine learning models at scale. The SageMaker SDK simplifies interacting with SageMaker programmatically. Amazon SageMaker SDK provides a seamless integration specifically designed for Hugging Face models, simplifying the deployment process of managed endpoints. With this integration, you can quickly deploy pre-trained Hugging Face models or your own fine-tuned models directly into SageMaker-managed endpoints, significantly reducing setup complexity and time to production.
+
+To get started, check out this tutorial.
+
+## With Sagemaker Jumpstart
+
+Amazon SageMaker JumpStart is a curated model catalog from which you can deploy a model with just a few clicks. We maintain a Hugging Face section in the catalog that will let you self-host the most famous open models in your VPC with performant default configurations, powered under the hood by Hugging Face Deep Learning Catalogs (DLCs). (#todo link to DLC intro)
+
+To get started, check out this tutorial.
+
+## With AWS Bedrock
+
+Amazon Bedrock enables developers to easily build and scale generative AI applications through a single API.  With Bedrock Marketplace, you can now combine the ease of use of SageMaker JumpStart with the fully managed infrastructure of Amazon Bedrock, including compatibility with high-level APIs such as Agents, Knowledge Bases, Guardrails and Model Evaluations.
+
+To get started, check out this [blogpost](https://huggingface.co/blog/bedrock-marketplace?).
+
+## With Hugging Face Inference Endpoints
+
+Hugging Face Inference Endpoints allow you to deploy models hosted directly by Hugging Face, fully managed and optimized for performance. It's ideal for quick deployment and scalable inference workloads.
+
+[Get started with Hugging Face Inference Endpoints](https://huggingface.co/docs/inference-endpoints/main/en/index).
+
+## With ECS, EKS, and EC2
+
+Hugging Face provides Inference Deep Learning Containers (DLCs) to AWS users, optimized environments preconfigured with Hugging Face libraries for inference, natively integrated in SageMaker SDK and JumpStart. However, the HF DLCs can also be used across other AWS services like ECS, EKS, and EC2.
+
+AWS Elastic Container Service (ECS), Elastic Kubernetes Service (EKS), and Elastic Compute Cloud (EC2) allow you to leverage DLCs directly.
+
+Get started with HF DLCs on EC2. 
+Get started with HF DLCs on ECS.
+Get started with HF DLCs on EKS.
diff --git a/docs/sagemaker/getting-started/index.md b/docs/sagemaker/getting-started/index.md
diff --git a/docs/sagemaker/getting-started/resources.md b/docs/sagemaker/getting-started/resources.md
@@ -0,0 +1,42 @@
+# Resources
+
+Take a look at our published blog posts, videos, documentation, sample notebooks and scripts for additional help and more context about Hugging Face on AWS.
+
+## Blogs and videos
+
+- [AWS: Embracing natural language processing with Hugging Face](https://aws.amazon.com/de/blogs/opensource/embracing-natural-language-processing-with-hugging-face/)
+- [Deploy Hugging Face models easily with Amazon SageMaker](https://huggingface.co/blog/deploy-hugging-face-models-easily-with-amazon-sagemaker)
+- [AWS and Hugging Face collaborate to simplify and accelerate adoption of natural language processing models](https://aws.amazon.com/blogs/machine-learning/aws-and-hugging-face-collaborate-to-simplify-and-accelerate-adoption-of-natural-language-processing-models/)
+- [Walkthrough: End-to-End Text Classification](https://youtu.be/ok3hetb42gU)
+- [Working with Hugging Face models on Amazon SageMaker](https://youtu.be/leyrCgLAGjMn)
+- [Distributed Training: Train BART/T5 for Summarization using 🤗 Transformers and Amazon SageMaker](https://huggingface.co/blog/sagemaker-distributed-training-seq2seq)
+- [Deploy a Hugging Face Transformers Model from S3 to Amazon SageMaker](https://youtu.be/pfBGgSGnYLs)
+- [Deploy a Hugging Face Transformers Model from the Model Hub to Amazon SageMaker](https://youtu.be/l9QZuazbzWM)
+
+## Documentation
+
+- [Run training on Amazon SageMaker](/docs/sagemaker/train)
+- [Deploy models to Amazon SageMaker](/docs/sagemaker/inference)
+- [Reference](/docs/sagemaker/reference)
+- [Amazon SageMaker documentation for Hugging Face](https://docs.aws.amazon.com/sagemaker/latest/dg/hugging-face.html)
+- [Python SDK SageMaker documentation for Hugging Face](https://sagemaker.readthedocs.io/en/stable/frameworks/huggingface/index.html)
+- [Deep Learning Container](https://github.com/aws/deep-learning-containers/blob/master/available_images.md#huggingface-training-containers)
+- [SageMaker's Distributed Data Parallel Library](https://docs.aws.amazon.com/sagemaker/latest/dg/data-parallel.html)
+- [SageMaker's Distributed Model Parallel Library](https://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel.html)
+
+## Sample workshops
+
+## Sample notebooks
+
+- [All notebooks](https://github.com/huggingface/notebooks/tree/master/sagemaker)
+- [Getting Started with Pytorch](https://github.com/huggingface/notebooks/blob/main/sagemaker/01_getting_started_pytorch/sagemaker-notebook.ipynb)
+- [Getting Started with Tensorflow](https://github.com/huggingface/notebooks/blob/main/sagemaker/02_getting_started_tensorflow/sagemaker-notebook.ipynb)
+- [Distributed Training Data Parallelism](https://github.com/huggingface/notebooks/blob/main/sagemaker/03_distributed_training_data_parallelism/sagemaker-notebook.ipynb)
+- [Distributed Training Model Parallelism](https://github.com/huggingface/notebooks/blob/main/sagemaker/04_distributed_training_model_parallelism/sagemaker-notebook.ipynb)
+- [Spot Instances and continue training](https://github.com/huggingface/notebooks/blob/main/sagemaker/05_spot_instances/sagemaker-notebook.ipynb)
+- [SageMaker Metrics](https://github.com/huggingface/notebooks/blob/main/sagemaker/06_sagemaker_metrics/sagemaker-notebook.ipynb)
+- [Distributed Training Data Parallelism Tensorflow](https://github.com/huggingface/notebooks/blob/main/sagemaker/07_tensorflow_distributed_training_data_parallelism/sagemaker-notebook.ipynb)
+- [Distributed Training Summarization](https://github.com/huggingface/notebooks/blob/main/sagemaker/08_distributed_summarization_bart_t5/sagemaker-notebook.ipynb)
+- [Image Classification with Vision Transformer](https://github.com/huggingface/notebooks/blob/main/sagemaker/09_image_classification_vision_transformer/sagemaker-notebook.ipynb)
+- [Deploy one of the 10 000+ Hugging Face Transformers to Amazon SageMaker for Inference](https://github.com/huggingface/notebooks/blob/main/sagemaker/11_deploy_model_from_hf_hub/deploy_transformer_model_from_hf_hub.ipynb)
+- [Deploy a Hugging Face Transformer model from S3 to SageMaker for inference](https://github.com/huggingface/notebooks/blob/main/sagemaker/10_deploy_model_from_s3/deploy_transformer_model_from_s3.ipynb)
diff --git a/docs/sagemaker/getting-started/train.md b/docs/sagemaker/getting-started/train.md
@@ -0,0 +1,19 @@
+# Train models on AWS
+
+Training Hugging Face models on AWS is streamlined through various services. Here's how you can fine-tune your models using AWS and Hugging Face offerings.
+
+## With Sagemaker SDK
+
+Amazon SageMaker is a fully managed AWS service for building, training, and deploying machine learning models at scale. The SageMaker SDK simplifies interacting with SageMaker programmatically. Amazon SageMaker SDK provides a seamless integration specifically designed for Hugging Face models, simplifying the training job management. With this integration, you can quickly create your own fine-tuned models, significantly reducing setup complexity and time to production.
+
+To get started, check out this example.
+
+## With ECS, EKS, and EC2
+
+Hugging Face provides Training Deep Learning Containers (DLCs) to AWS users, optimized environments preconfigured with Hugging Face libraries for training, natively integrated in SageMaker SDK. However, the HF DLCs can also be used across other AWS services like ECS, EKS, and EC2.
+
+AWS Elastic Container Service (ECS), Elastic Kubernetes Service (EKS), and Elastic Compute Cloud (EC2) allow you to leverage DLCs directly.
+
+Get started with HF DLCs on EC2
+Get started with HF DLCs on ECS
+Get started with HF DLCs on EKS
diff --git a/docs/sagemaker/how-to/get-started-sagemaker-sdk.md b/docs/sagemaker/how-to/get-started-sagemaker-sdk.md
diff --git a/docs/sagemaker/reference.md b/docs/sagemaker/reference.md