generated from amazon-archives/__template_Apache-2.0
-
Notifications
You must be signed in to change notification settings - Fork 61
Open
Description
Based on the Python script below, we’re unable to deploy the model because diffusers ≥ 0.28 no longer accepts the device type "auto". Consequently, the GPU isn’t detected when the endpoint starts.
import sagemaker
import boto3
from sagemaker.huggingface import HuggingFaceModel
iam = boto3.client("iam")
role = iam.get_role(RoleName="my-ml-sagemaker-role")["Role"]["Arn"]
# Hub model configuration – see https://huggingface.co/models
hub = {
"HF_MODEL_ID": "stable-diffusion-v1-5/stable-diffusion-v1-5",
"HF_TASK": "text-to-image",
}
# Create a Hugging Face model
huggingface_model = HuggingFaceModel(
transformers_version="4.49.0",
pytorch_version="2.6.0",
py_version="py312",
env=hub,
role=role,
)
# Deploy the model to SageMaker Inference
predictor = huggingface_model.deploy(
initial_instance_count=1, # number of instances
instance_type="ml.g4dn.4xlarge", # EC2 instance type
)
image_bytes = predictor.predict({"inputs": "Astronaut riding a horse"})
# Display the generated image with PIL
import io
from PIL import Image
image = Image.open(io.BytesIO(image_bytes))aws sagemaker-runtime invoke-endpoint \
--endpoint-name huggingface-pytorch-inference-2025-05-14-17-08-18-830 \
--body "fileb://input_file.txt" output_file.txtThis returns:
An error occurred (ModelError) when calling the InvokeEndpoint operation:
Received client error (400) from primary with message:
{
"code": 400,
"type": "InternalServerException",
"message": "auto not supported. Supported strategies are: balanced"
}
It looks like the SageMaker Hugging Face inference toolkit needs an update ? It use balanced only if there is more than 2 GPUs.
This issue might be related, so I'm pinning it : huggingface/diffusers#11555
Any guidance on how to resolve this SageMaker deployment issue would be greatly appreciated.
Thank you!
Metadata
Metadata
Assignees
Labels
No labels