-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Open
Labels
questionFurther information is requestedFurther information is requested
Description
I know triton inference server support to serve the lora model but seems it only supports for LLM model with vLLM backend and TensorRTLLM backend.
Our use case is:
We have a foundation model for eeg data. then we fine tuning the model on different task with peft library from Hugging face.
the output is:
- base model
- many lora adapter for different tasks
How we can run inference with triton server without duplicating the weight of base model
Metadata
Metadata
Assignees
Labels
questionFurther information is requestedFurther information is requested