from sagemaker.huggingface.model import HuggingFaceModel
# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
model_data=model_data, # path to your model and script
role=role, # iam role with permissions to create an Endpoint
image_uri='763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-inference-neuron:1.10.2-transformers4.20.1-neuron-py37-sdk1.19.1-ubuntu18.04'
)
# Let SageMaker know that we've already compiled the model via neuron-cc
huggingface_model._is_compiled_model = True
# deploy the endpoint endpoint
predictor = huggingface_model.deploy(
initial_instance_count=1, # number of instances
instance_type="ml.inf1.24xlarge" # AWS Inferentia Instance
)
when I am using inf1.xlarge my endpoints works as expected. the moment I switch to ml.inf1.24xlarge or ml.inf1.6xlarge or ml.inf1.2xlarge I get hit with the following error.
