Skip to content

Commit 267738d

Browse files
committed
clea toctree and port previous content to new section
1 parent ae73559 commit 267738d

File tree

9 files changed

+163
-350
lines changed

9 files changed

+163
-350
lines changed

β€Ždocs/sagemaker/dlcs/introduction.mdβ€Ž

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,4 +7,6 @@ The containers are publicly maintained, updated and released periodically by Hug
77
* Amazon Bedrock: Amazon Bedrock is a fully managed service that makes high-performing foundation models (FMs) from leading AI companies and Amazon available for your use through a unified API to build generative AI applications.
88
* Amazon Elastic Kubernetes Service (EKS): Amazon EKS is the premiere platform for running Kubernetes clusters in the AWS cloud.
99
* Amazon Elastic Container Service (ECS): Amazon ECS is a fully managed container orchestration service that helps you easily deploy, manage, and scale containerized applications.
10-
* Amazon Elastic Compute Cloud (EC2): Amazon EC2 provides on-demand, scalable computing capacity in the Amazon Web Services (AWS) Cloud.
10+
* Amazon Elastic Compute Cloud (EC2): Amazon EC2 provides on-demand, scalable computing capacity in the Amazon Web Services (AWS) Cloud.
11+
12+
Hugging Face DLCs are open source and licensed under Apache 2.0. Feel free to reach out on our [community forum](https://discuss.huggingface.co/c/sagemaker/17) if you have any questions.

β€Ždocs/sagemaker/getting-started.mdβ€Ž

Lines changed: 0 additions & 151 deletions
This file was deleted.

β€Ždocs/sagemaker/getting-started/index.mdβ€Ž

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
# Hugging Face on AWS
22

3+
![cover](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/sagemaker/cover.png)
4+
35
Hugging Face partners with Amazon Web Services (AWS) to democratize artificial intelligence (AI), enabling developers to seamlessly build, train, and deploy state-of-the-art machine learning models using AWS's robust cloud infrastructure. ​
46

57
This collaboration aims to offer developers access to an everyday growing catalog of pre-trained models and dataset from the Hugging Face Hub, using Hugging Face open-source libraries across a broad spectrum of AWS services and hardware platforms.
File renamed without changes.
Lines changed: 151 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,151 @@
1+
# Train and deploy Hugging Face on Amazon SageMaker
2+
3+
The get started guide will show you how to quickly use Hugging Face on Amazon SageMaker. Learn how to fine-tune and deploy a pretrained πŸ€— Transformers model on SageMaker for a binary text classification task.
4+
5+
πŸ’‘ If you are new to Hugging Face, we recommend first reading the πŸ€— Transformers [quick tour](https://huggingface.co/docs/transformers/quicktour).
6+
7+
<iframe width="560" height="315" src="https://www.youtube.com/embed/pYqjCzoyWyo" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
8+
9+
πŸ““ Open the [agemaker-notebook.ipynb file](https://github.com/huggingface/notebooks/blob/main/sagemaker/01_getting_started_pytorch/sagemaker-notebook.ipynb) to follow along!
10+
11+
## Installation and setup
12+
13+
Get started by installing the necessary Hugging Face libraries and SageMaker. You will also need to install [PyTorch](https://pytorch.org/get-started/locally/) and [TensorFlow](https://www.tensorflow.org/install/pip#tensorflow-2-packages-are-available) if you don't already have it installed.
14+
15+
```python
16+
pip install "sagemaker>=2.140.0" "transformers==4.26.1" "datasets[s3]==2.10.1" --upgrade
17+
```
18+
19+
If you want to run this example in [SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio.html), upgrade [ipywidgets](https://ipywidgets.readthedocs.io/en/latest/) for the πŸ€— Datasets library and restart the kernel:
20+
21+
```python
22+
%%capture
23+
import IPython
24+
!conda install -c conda-forge ipywidgets -y
25+
IPython.Application.instance().kernel.do_shutdown(True)
26+
```
27+
28+
Next, you should set up your environment: a SageMaker session and an S3 bucket. The S3 bucket will store data, models, and logs. You will need access to an [IAM execution role](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-roles.html) with the required permissions.
29+
30+
If you are planning on using SageMaker in a local environment, you need to provide the `role` yourself. Learn more about how to set this up [here](https://huggingface.co/docs/sagemaker/train#installation-and-setup).
31+
32+
⚠️ The execution role is only available when you run a notebook within SageMaker. If you try to run `get_execution_role` in a notebook not on SageMaker, you will get a region error.
33+
34+
```python
35+
import sagemaker
36+
37+
sess = sagemaker.Session()
38+
sagemaker_session_bucket = None
39+
if sagemaker_session_bucket is None and sess is not None:
40+
sagemaker_session_bucket = sess.default_bucket()
41+
42+
role = sagemaker.get_execution_role()
43+
sess = sagemaker.Session(default_bucket=sagemaker_session_bucket)
44+
```
45+
46+
## Preprocess
47+
48+
The πŸ€— Datasets library makes it easy to download and preprocess a dataset for training. Download and tokenize the [IMDb](https://huggingface.co/datasets/imdb) dataset:
49+
50+
```python
51+
from datasets import load_dataset
52+
from transformers import AutoTokenizer
53+
54+
# load dataset
55+
train_dataset, test_dataset = load_dataset("imdb", split=["train", "test"])
56+
57+
# load tokenizer
58+
tokenizer = AutoTokenizer.from_pretrained("distilbert/distilbert-base-uncased")
59+
60+
# create tokenization function
61+
def tokenize(batch):
62+
return tokenizer(batch["text"], padding="max_length", truncation=True)
63+
64+
# tokenize train and test datasets
65+
train_dataset = train_dataset.map(tokenize, batched=True)
66+
test_dataset = test_dataset.map(tokenize, batched=True)
67+
68+
# set dataset format for PyTorch
69+
train_dataset = train_dataset.rename_column("label", "labels")
70+
train_dataset.set_format("torch", columns=["input_ids", "attention_mask", "labels"])
71+
test_dataset = test_dataset.rename_column("label", "labels")
72+
test_dataset.set_format("torch", columns=["input_ids", "attention_mask", "labels"])
73+
```
74+
75+
## Upload dataset to S3 bucket
76+
77+
Next, upload the preprocessed dataset to your S3 session bucket with πŸ€— Datasets S3 [filesystem](https://huggingface.co/docs/datasets/filesystems.html) implementation:
78+
79+
```python
80+
# save train_dataset to s3
81+
training_input_path = f's3://{sess.default_bucket()}/{s3_prefix}/train'
82+
train_dataset.save_to_disk(training_input_path)
83+
84+
# save test_dataset to s3
85+
test_input_path = f's3://{sess.default_bucket()}/{s3_prefix}/test'
86+
test_dataset.save_to_disk(test_input_path)
87+
```
88+
89+
## Start a training job
90+
91+
Create a Hugging Face Estimator to handle end-to-end SageMaker training and deployment. The most important parameters to pay attention to are:
92+
93+
* `entry_point` refers to the fine-tuning script which you can find in [train.py file](https://github.com/huggingface/notebooks/blob/main/sagemaker/01_getting_started_pytorch/scripts/train.py).
94+
* `instance_type` refers to the SageMaker instance that will be launched. Take a look [here](https://aws.amazon.com/sagemaker/pricing/) for a complete list of instance types.
95+
* `hyperparameters` refers to the training hyperparameters the model will be fine-tuned with.
96+
97+
```python
98+
from sagemaker.huggingface import HuggingFace
99+
100+
hyperparameters={
101+
"epochs": 1, # number of training epochs
102+
"train_batch_size": 32, # training batch size
103+
"model_name":"distilbert/distilbert-base-uncased" # name of pretrained model
104+
}
105+
106+
huggingface_estimator = HuggingFace(
107+
entry_point="train.py", # fine-tuning script to use in training job
108+
source_dir="./scripts", # directory where fine-tuning script is stored
109+
instance_type="ml.p3.2xlarge", # instance type
110+
instance_count=1, # number of instances
111+
role=role, # IAM role used in training job to acccess AWS resources (S3)
112+
transformers_version="4.26", # Transformers version
113+
pytorch_version="1.13", # PyTorch version
114+
py_version="py39", # Python version
115+
hyperparameters=hyperparameters # hyperparameters to use in training job
116+
)
117+
```
118+
119+
Begin training with one line of code:
120+
121+
```python
122+
huggingface_estimator.fit({"train": training_input_path, "test": test_input_path})
123+
```
124+
125+
## Deploy model
126+
127+
Once the training job is complete, deploy your fine-tuned model by calling `deploy()` with the number of instances and instance type:
128+
129+
```python
130+
predictor = huggingface_estimator.deploy(initial_instance_count=1,"ml.g4dn.xlarge")
131+
```
132+
133+
Call `predict()` on your data:
134+
135+
```python
136+
sentiment_input = {"inputs": "It feels like a curtain closing...there was an elegance in the way they moved toward conclusion. No fan is going to watch and feel short-changed."}
137+
138+
predictor.predict(sentiment_input)
139+
```
140+
141+
After running your request, delete the endpoint:
142+
143+
```python
144+
predictor.delete_endpoint()
145+
```
146+
147+
## What's next?
148+
149+
Congratulations, you've just fine-tuned and deployed a pretrained πŸ€— Transformers model on SageMaker! πŸŽ‰
150+
151+
For your next steps, keep reading our documentation for more details about training and deployment. There are many interesting features such as [distributed training](/docs/sagemaker/train#distributed-training) and [Spot instances](/docs/sagemaker/train#spot-instances).
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
# How to
2+
3+
## Sagemaker SDK
4+
5+
## Jumpstart
6+
7+
## Bedrock
File renamed without changes.

0 commit comments

Comments
Β (0)