[Question] what machine should I use to run this with local LLMs? #548
Unanswered
DaisukeMiyazaki
asked this question in
Q&A
Replies: 1 comment
-
I can run Llama 3 70B on my macbook pro M3 max 96GB smoothly, a bit hot after a while though. However, I predict that we will have 13B or smaller models being close to gpt-4 grade at least for some domains before the end of 2024. So the local LLM being on-device is going main stream for sure IMO. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
*short answer could be: get maxed out M3 Max 128GB RAM ...
This might be a silly question, but what would be the best/ most reasonable machine to run this plugin for using local LLMs? Especially after seeing local beta QA feature, I'd like to make a rough assumption on how much performance we can expect for each machine, as these machines aren't cheap.
I've been testing with my M1 macbook Air with 16GB RAM with LM Studio however, stream responses still take about 10s ~ with
n_gpu_layers 24
when answering questions. Indexing with local embedding over 2000 files took a few minutes which seemed reasonable to me though.Given the pace of seeing better models with 7B or bigger lately, possible quantizations, and also providers like Apple launching newer and newer machines, it's not very clear that which one fits for a long time usage.
Beta Was this translation helpful? Give feedback.
All reactions