Support load_lora_weights in inference API deploy

Currently there is no way to add `load_lora_weights` in deployment

```
hub = {
    'HF_MODEL_ID': 'black-forest-labs/FLUX.1-dev',
    'HF_TASK':'text-to-image',                         
    'HF_TOKEN':'TOKEN'
}

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
   env=hub,                                                # configuration for loading model from Hub
   role=role,                                              # IAM role with permissions to create an endpoint
   transformers_version="4.26",                             # Transformers version used
   pytorch_version="1.13",                                  # PyTorch version used
   py_version='py39',                                      # Python version used
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
   initial_instance_count=1,
   instance_type="ml.g4dn.xlarge"
)
```

Maybe in hub, there could be a new env var as "HF_LORA_MODEL" 

Similar implementation present in here https://github.com/aws-samples/sagemaker-stablediffusion-quick-kit/compare/bd37fe958ec8ea27f3e25b7a4c41a3bf9b660bb8...2d1c43be0aec8c20791dd12c2268231c60282f64

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support load_lora_weights in inference API deploy #131

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support load_lora_weights in inference API deploy #131

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions