Are there any plans to implement concurrent LoRa inference with multiple adapters (such as S-lora)? #1237
Unanswered
SamGalanakis
asked this question in
Q&A
Replies: 2 comments 1 reply
-
Do you mean as suggested in #903? If yes, there are plans, hopefully we can tackle it soon. But note that SLoRA has a bunch of specialized optimizations that we cannot do in PEFT, as we want to support a very broad range of models and adapter types. |
Beta Was this translation helpful? Give feedback.
0 replies
-
Ah my bad I missed that one. So that will allow parallel inference, any rough idea how pefromant it will be throughput and memory wise? |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Would be very useful and there doesn't seem to be a flexible implementation of this yet.
Beta Was this translation helpful? Give feedback.
All reactions