Skip to content

Inference works on ml.inf1.xlarge but fails on ml.inf1.24xlarge with ""The PyTorch Neuron Runtime could not be initialized""  #471

@aj2622

Description

@aj2622

deployment code

from sagemaker.huggingface.model import HuggingFaceModel

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
   model_data=model_data,       # path to your model and script
   role=role,                    # iam role with permissions to create an Endpoint
   image_uri='763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-inference-neuron:1.10.2-transformers4.20.1-neuron-py37-sdk1.19.1-ubuntu18.04'
)

# Let SageMaker know that we've already compiled the model via neuron-cc
huggingface_model._is_compiled_model = True

# deploy the endpoint endpoint
predictor = huggingface_model.deploy(
    initial_instance_count=1,      # number of instances
    instance_type="ml.inf1.24xlarge" # AWS Inferentia Instance
)

when I am using inf1.xlarge my endpoints works as expected. the moment I switch to ml.inf1.24xlarge or ml.inf1.6xlarge or ml.inf1.2xlarge I get hit with the following error.
image

What am I missing here ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions