NPU support for LLM acceleration without gpu #2257
Closed
sebastienbo
started this conversation in
Ideas
Replies: 2 comments
-
I concur with your perspective; acquiring a 64GB DDR5 RAM module is indeed more feasible compared to obtaining a 64GB GPU at present.Indeed, incorporating NPU support holds the promise of delivering significant advantages to users in terms of model inference compared to solely relying on GPU support. |
Beta Was this translation helpful? Give feedback.
0 replies
-
If we did add support for NPUs the code would go in llama.cpp, which we use for LLM inference. See these upstream issues: |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Is it possible to support NPU's ? Those are way more specialised for LLM traversal and don't require a gpu.
It would offload the cpu and would be able to run from Laptops with NPU's
Here is an example code: https://intel.github.io/intel-npu-acceleration-library/llm.html
Beta Was this translation helpful? Give feedback.
All reactions