Skip to content

Sagemaker HuggingfaceModel fails on phi3 model deployment #123

@manikawnth

Description

@manikawnth

I'm not able to deploy the Phi3 model from huggingface model hub to sagemaker.
I tried using multiple DLC containers, with and without trust_remote_code: true . Still not able to get it run.

I receive the following error:

Traceback (most recent call last):
  File "/opt/conda/bin/text-generation-server", line 8, in <module>
    sys.exit(app())
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py", line 90, in serve
    server.serve(
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 258, in serve
    asyncio.run(
  File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 222, in serve_inner
    model = get_model(
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/__init__.py", line 420, in get_model
    return FlashLlama(
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/flash_llama.py", line 84, in __init__
    model = FlashLlamaForCausalLM(prefix, config, weights)
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py", line 368, in __init__
    self.model = FlashLlamaModel(prefix, config, weights)
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py", line 292, in __init__
    [
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py", line 293, in <listcomp>
    FlashLlamaLayer(
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py", line 232, in __init__
    self.self_attn = FlashLlamaAttention(
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py", line 108, in __init__
    self.query_key_value = load_attention(config, prefix, weights)
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py", line 43, in load_attention
    bias = config.attention_bias
  File "/opt/conda/lib/python3.10/site-packages/transformers/configuration_utils.py", line 263, in __getattribute__
    return super().__getattribute__(key)

AttributeError: 'Phi3Config' object has no attribute 'attention_bias' #033[2m#033[3mrank#033[0m#033[2m=#033[0m0#033[0m


#033[2m2024-05-21T16:19:40.764815Z#033[0m #033[31mERROR#033[0m #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m Shard 0 failed to start
#033[2m2024-05-21T16:19:40.764834Z#033[0m #033[32m INFO#033[0m #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m Shutting down shards

Error: ShardCannotStart

from sagemaker import get_execution_role, Session
import boto3
sagemaker_session = Session()
region = boto3.Session().region_name

# get execution role

# please use execution role if you are using notebook instance or update the role arn if you are using a different role
execution_role = get_execution_role()

image_uri = '763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-tgi-inference:2.3.0-tgi2.0.3-gpu-py310-cu121-ubuntu22.04-v2.0'

from sagemaker.huggingface import HuggingFaceModel

hub = {
  'HF_TASK': 'text-generation',
  'HF_MODEL_ID':'microsoft/Phi-3-mini-128k-instruct',
  'TRUST_REMOTE_CODE': 'true',
  'HF_MODEL_TRUST_REMOTE_CODE': 'true'
}

huggingface_model = HuggingFaceModel(
    env=hub,
    image_uri=image_uri,
    role=execution_role,
    sagemaker_session=sagemaker_session
)

predictor = huggingface_model.deploy( initial_instance_count=1,instance_type="ml.g5.2xlarge")

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions