What is the chance of running Llama langauge model locally? #14879
Unanswered
elephantpanda
asked this question in
General
Replies: 2 comments 2 replies
-
People have got the model to run locally on GPU with 16GB VRAM. (CUDA) using torch. Maybe someone from Onnxruntime should try it to make sure it works with your system too. |
Beta Was this translation helpful? Give feedback.
0 replies
-
See https://github.com/ggerganov/llama.cpp, maybe they already have WASM module for it. They also have managed to run quantized LLAMA on Android in Termux. |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi with the new Llama language model from Meta. What is the possibility that one could run this successfully locally on, say a GeForce 3080 32GB GPU using Onnxruntime.
Even though Llama is a rival to Chat GPT from Microsoft(!!)
It is reported to have a 7B parameter model. Which I guess in float16 would come to 14GB.
Should I even be asking this question 😯. Well even though it is from Meta people still want to run it on Microsoft Windows....
Beta Was this translation helpful? Give feedback.
All reactions