Skip to content

Commit 11ae0e1

Browse files
committed
jumpstart guide
1 parent 1c5a542 commit 11ae0e1

File tree

1 file changed

+85
-2
lines changed

1 file changed

+85
-2
lines changed
Lines changed: 85 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,86 @@
1-
# Jumpstart Quickstart
1+
# Quickstart - Deploy Hugging Face Models with SageMaker Jumpstart
22

3-
This page is under construction, bear with us!
3+
## Why use SageMaker JumpStart for Hugging Face models?
4+
5+
Amazon SageMaker **JumpStart** lets you deploy the most-popular open Hugging Face models with **one click**—inside your own AWS account. JumpStart offers a curated [selection(https://aws.amazon.com/sagemaker-ai/jumpstart/getting-started/?sagemaker-jumpstart-cards.sort-by=item.additionalFields.model-name&sagemaker-jumpstart-cards.sort-order=asc&awsf.sagemaker-jumpstart-filter-product-type=*all&awsf.sagemaker-jumpstart-filter-text=*all&awsf.sagemaker-jumpstart-filter-vision=*all&awsf.sagemaker-jumpstart-filter-tabular=*all&awsf.sagemaker-jumpstart-filter-audio-tasks=*all&awsf.sagemaker-jumpstart-filter-multimodal=*all&awsf.sagemaker-jumpstart-filter-RL=*all&awsm.page-sagemaker-jumpstart-cards=1&sagemaker-jumpstart-cards.q=qwen&sagemaker-jumpstart-cards.q_operator=AND)] of model checkpoints for various tasks, including text generation, embeddings, vision, audio, and more. Most models are deployed using the official [Hugging Face Deep Learning Containers](https://huggingface.co/docs/sagemaker/main/en/dlcs/introduction) with a sensible default instance type, so you can move from idea to production in minutes.
6+
7+
In this quickstart guide, we will deploy [Qwen/Qwen2.5-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct).
8+
9+
## 1. Prerequisites
10+
11+
| | Requirement | Notes |
12+
|---|-------------|-------|
13+
| AWS account with SageMaker enabled | An AWS account that will contain all your AWS resources. |
14+
| An IAM role to access SageMaker AI | Learn more about how IAM works with SageMaker AI in this [guide](https://docs.aws.amazon.com/sagemaker/latest/dg/security-iam.html). |
15+
| SageMaker Studio domain and user profile | We recommend using SageMaker Studio for straightforward deployment and inference. Follow this [guide](https://docs.aws.amazon.com/sagemaker/latest/dg/onboard-quick-start.html). |
16+
| Service quotas | Most LLMs need GPU instances (e.g. ml.g5). Verify you have quota for ml.g5.24xlarge or [request it](https://docs.aws.amazon.com/sagemaker/latest/dg/canvas-requesting-quota-increases.html). |
17+
18+
## 2· Endpoint deployment
19+
20+
Let's explain how you would deploy a Hugging Face model to SageMaker browsing through the Jumpstart catalog:
21+
1. **Open** SageMaker → **JumpStart**.
22+
2. Filter **“Hugging Face”** or search for your model (e.g. **Qwen2.5-14B**).
23+
3. Click **Deploy** → (optional) adjust instance size / count → **Deploy**.
24+
4. Wait until *Endpoints* shows **In service**.
25+
5. Copy the **Endpoint name** (or ARN) for later use.
26+
27+
Alternatively, you can also browse through the Hugging Face Model Hub:
28+
1. Open the model page → Click **Deploy****SageMaker****Jumpstart** tab if model is available.
29+
2. Copy the code snippet and use it from a SageMaker Notebook instance.
30+
31+
```python
32+
# SageMaker JumpStart provides APIs as part of SageMaker SDK that allow you to deploy and fine-tune models in network isolation using scripts that SageMaker maintains.
33+
34+
from sagemaker.jumpstart.model import JumpStartModel
35+
36+
37+
model = JumpStartModel(model_id="huggingface-llm-qwen2-5-14b-instruct")
38+
example_payloads = model.retrieve_all_examples()
39+
40+
predictor = model.deploy()
41+
42+
for payload in example_payloads:
43+
response = predictor.predict(payload.body)
44+
print("Input:\n", payload.body[payload.prompt_key])
45+
print("Output:\n", response[0]["generated_text"], "\n\n===============\n")
46+
```
47+
48+
The endpoint creation can take several minutes, depending on the size of the model.
49+
50+
## 3. Test interactively
51+
52+
If you deployed through the console, you need to grab the endpoint ARN and reuse in your code.
53+
```python
54+
from sagemaker.predictor import retrieve_default
55+
endpoint_name = "MY ENDPOINT NAME"
56+
predictor = retrieve_default(endpoint_name)
57+
payload = {
58+
"messages": [
59+
{
60+
"role": "system",
61+
"content": "You are a passionate data scientist."
62+
},
63+
{
64+
"role": "user",
65+
"content": "what is machine learning?"
66+
}
67+
],
68+
"max_tokens": 2048,
69+
"temperature": 0.7,
70+
"top_p": 0.9,
71+
"stream": False
72+
}
73+
74+
response = predictor.predict(payload)
75+
print(response)
76+
```
77+
78+
The endpoint support the Open AI API specification.
79+
80+
## 4. Clean‑up
81+
82+
To avoid incurring unnecessary costs, when you’re done, delete the SageMaker endpoints in the Deployments → Endpoints console or using the following code snippets:
83+
```python
84+
predictor.delete_model()
85+
predictor.delete_endpoint()
86+
```

0 commit comments

Comments
 (0)