|
1 | | -# Jumpstart Quickstart |
| 1 | +# Quickstart - Deploy Hugging Face Models with SageMaker Jumpstart |
2 | 2 |
|
3 | | -This page is under construction, bear with us! |
| 3 | +## Why use SageMaker JumpStart for Hugging Face models? |
| 4 | + |
| 5 | +Amazon SageMaker **JumpStart** lets you deploy the most-popular open Hugging Face models with **one click**—inside your own AWS account. JumpStart offers a curated [selection(https://aws.amazon.com/sagemaker-ai/jumpstart/getting-started/?sagemaker-jumpstart-cards.sort-by=item.additionalFields.model-name&sagemaker-jumpstart-cards.sort-order=asc&awsf.sagemaker-jumpstart-filter-product-type=*all&awsf.sagemaker-jumpstart-filter-text=*all&awsf.sagemaker-jumpstart-filter-vision=*all&awsf.sagemaker-jumpstart-filter-tabular=*all&awsf.sagemaker-jumpstart-filter-audio-tasks=*all&awsf.sagemaker-jumpstart-filter-multimodal=*all&awsf.sagemaker-jumpstart-filter-RL=*all&awsm.page-sagemaker-jumpstart-cards=1&sagemaker-jumpstart-cards.q=qwen&sagemaker-jumpstart-cards.q_operator=AND)] of model checkpoints for various tasks, including text generation, embeddings, vision, audio, and more. Most models are deployed using the official [Hugging Face Deep Learning Containers](https://huggingface.co/docs/sagemaker/main/en/dlcs/introduction) with a sensible default instance type, so you can move from idea to production in minutes. |
| 6 | + |
| 7 | +In this quickstart guide, we will deploy [Qwen/Qwen2.5-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct). |
| 8 | + |
| 9 | +## 1. Prerequisites |
| 10 | + |
| 11 | +| | Requirement | Notes | |
| 12 | +|---|-------------|-------| |
| 13 | +| AWS account with SageMaker enabled | An AWS account that will contain all your AWS resources. | |
| 14 | +| An IAM role to access SageMaker AI | Learn more about how IAM works with SageMaker AI in this [guide](https://docs.aws.amazon.com/sagemaker/latest/dg/security-iam.html). | |
| 15 | +| SageMaker Studio domain and user profile | We recommend using SageMaker Studio for straightforward deployment and inference. Follow this [guide](https://docs.aws.amazon.com/sagemaker/latest/dg/onboard-quick-start.html). | |
| 16 | +| Service quotas | Most LLMs need GPU instances (e.g. ml.g5). Verify you have quota for ml.g5.24xlarge or [request it](https://docs.aws.amazon.com/sagemaker/latest/dg/canvas-requesting-quota-increases.html). | |
| 17 | + |
| 18 | +## 2· Endpoint deployment |
| 19 | + |
| 20 | +Let's explain how you would deploy a Hugging Face model to SageMaker browsing through the Jumpstart catalog: |
| 21 | +1. **Open** SageMaker → **JumpStart**. |
| 22 | +2. Filter **“Hugging Face”** or search for your model (e.g. **Qwen2.5-14B**). |
| 23 | +3. Click **Deploy** → (optional) adjust instance size / count → **Deploy**. |
| 24 | +4. Wait until *Endpoints* shows **In service**. |
| 25 | +5. Copy the **Endpoint name** (or ARN) for later use. |
| 26 | + |
| 27 | +Alternatively, you can also browse through the Hugging Face Model Hub: |
| 28 | +1. Open the model page → Click **Deploy** → **SageMaker** → **Jumpstart** tab if model is available. |
| 29 | +2. Copy the code snippet and use it from a SageMaker Notebook instance. |
| 30 | + |
| 31 | +```python |
| 32 | +# SageMaker JumpStart provides APIs as part of SageMaker SDK that allow you to deploy and fine-tune models in network isolation using scripts that SageMaker maintains. |
| 33 | + |
| 34 | +from sagemaker.jumpstart.model import JumpStartModel |
| 35 | + |
| 36 | + |
| 37 | +model = JumpStartModel(model_id="huggingface-llm-qwen2-5-14b-instruct") |
| 38 | +example_payloads = model.retrieve_all_examples() |
| 39 | + |
| 40 | +predictor = model.deploy() |
| 41 | + |
| 42 | +for payload in example_payloads: |
| 43 | + response = predictor.predict(payload.body) |
| 44 | + print("Input:\n", payload.body[payload.prompt_key]) |
| 45 | + print("Output:\n", response[0]["generated_text"], "\n\n===============\n") |
| 46 | +``` |
| 47 | + |
| 48 | +The endpoint creation can take several minutes, depending on the size of the model. |
| 49 | + |
| 50 | +## 3. Test interactively |
| 51 | + |
| 52 | +If you deployed through the console, you need to grab the endpoint ARN and reuse in your code. |
| 53 | +```python |
| 54 | +from sagemaker.predictor import retrieve_default |
| 55 | +endpoint_name = "MY ENDPOINT NAME" |
| 56 | +predictor = retrieve_default(endpoint_name) |
| 57 | +payload = { |
| 58 | + "messages": [ |
| 59 | + { |
| 60 | + "role": "system", |
| 61 | + "content": "You are a passionate data scientist." |
| 62 | + }, |
| 63 | + { |
| 64 | + "role": "user", |
| 65 | + "content": "what is machine learning?" |
| 66 | + } |
| 67 | + ], |
| 68 | + "max_tokens": 2048, |
| 69 | + "temperature": 0.7, |
| 70 | + "top_p": 0.9, |
| 71 | + "stream": False |
| 72 | +} |
| 73 | + |
| 74 | +response = predictor.predict(payload) |
| 75 | +print(response) |
| 76 | +``` |
| 77 | + |
| 78 | +The endpoint support the Open AI API specification. |
| 79 | + |
| 80 | +## 4. Clean‑up |
| 81 | + |
| 82 | +To avoid incurring unnecessary costs, when you’re done, delete the SageMaker endpoints in the Deployments → Endpoints console or using the following code snippets: |
| 83 | +```python |
| 84 | +predictor.delete_model() |
| 85 | +predictor.delete_endpoint() |
| 86 | +``` |
0 commit comments