Skip to content

Support load_lora_weights in inference API deploy #131

@haktan-suren

Description

@haktan-suren

Currently there is no way to add load_lora_weights in deployment

hub = {
    'HF_MODEL_ID': 'black-forest-labs/FLUX.1-dev',
    'HF_TASK':'text-to-image',                         
    'HF_TOKEN':'TOKEN'
}

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
   env=hub,                                                # configuration for loading model from Hub
   role=role,                                              # IAM role with permissions to create an endpoint
   transformers_version="4.26",                             # Transformers version used
   pytorch_version="1.13",                                  # PyTorch version used
   py_version='py39',                                      # Python version used
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
   initial_instance_count=1,
   instance_type="ml.g4dn.xlarge"
)

Maybe in hub, there could be a new env var as "HF_LORA_MODEL"

Similar implementation present in here aws-samples/sagemaker-stablediffusion-quick-kit@bd37fe9...2d1c43b

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions