Multi-Adaptor Support for Edge Devices #7755
zhipenghan
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Idea proposal:

To support multiple fine-tuned adaptors simultaneously on memory-constrained edge devices, enabling users to leverage diverse capabilities of Small Language Models (SLMs) while optimizing resource utilization.
Problem Statement:
On edge devices, general SLMs often struggle with specific tasks, and hosting separate SLMs for each downstream task is costly. Engineers typically fine-tune models to enhance their capabilities, but this approach has limitations.
Proposal:
Allow users to host a base model and a series of adaptors that augment the base model's capabilities. This approach would unlock the potential for customizing SLM capabilities, enabling users to:
I try to research and find ONNX runtime has similar design but the performance is not good enough.
[Performance] A way to share weights between sessions · Issue #15301 · microsoft/onnxruntime (github.com)
[Performance] Share weights between sessions to accelerate inference · Issue #20172 · microsoft/onnxruntime (github.com)
Beta Was this translation helpful? Give feedback.
All reactions