[Question] what machine should I use to run this with local LLMs? #548

DaisukeMiyazaki · 2024-05-04T02:42:01Z

DaisukeMiyazaki
May 4, 2024

*short answer could be: get maxed out M3 Max 128GB RAM ...

This might be a silly question, but what would be the best/ most reasonable machine to run this plugin for using local LLMs? Especially after seeing local beta QA feature, I'd like to make a rough assumption on how much performance we can expect for each machine, as these machines aren't cheap.

I've been testing with my M1 macbook Air with 16GB RAM with LM Studio however, stream responses still take about 10s ~ with n_gpu_layers 24 when answering questions. Indexing with local embedding over 2000 files took a few minutes which seemed reasonable to me though.

Given the pace of seeing better models with 7B or bigger lately, possible quantizations, and also providers like Apple launching newer and newer machines, it's not very clear that which one fits for a long time usage.

logancyang · 2024-08-09T21:18:39Z

logancyang
Aug 9, 2024
Maintainer

I can run Llama 3 70B on my macbook pro M3 max 96GB smoothly, a bit hot after a while though. However, I predict that we will have 13B or smaller models being close to gpt-4 grade at least for some domains before the end of 2024. So the local LLM being on-device is going main stream for sure IMO.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Question] what machine should I use to run this with local LLMs? #548

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

[Question] what machine should I use to run this with local LLMs? #548

Uh oh!

Uh oh!

DaisukeMiyazaki May 4, 2024

Replies: 1 comment

Uh oh!

logancyang Aug 9, 2024 Maintainer

DaisukeMiyazaki
May 4, 2024

logancyang
Aug 9, 2024
Maintainer