Add llmaz as another platform to run llama.cpp on Kubernetes #9096

kerthcet · 2024-08-20T02:49:04Z

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

Hi, llmaz is a platform to serve large language models on Kubernetes, llama.cpp is an vital part of it for CPU inference as well as GPU part. Here's an example:

apiVersion: llmaz.io/v1alpha1
kind: OpenModel
metadata:
  name: qwen2-0-5b-gguf
spec:
  familyName: qwen2
  source:
    modelHub:
      modelID: Qwen/Qwen2-0.5B-Instruct-GGUF
      filename: qwen2-0_5b-instruct-q5_k_m.gguf

apiVersion: inference.llmaz.io/v1alpha1
kind: Playground
metadata:
  name: qwen2-0-5b
spec:
  replicas: 1
  modelClaim:
    modelName: qwen2-0-5b-gguf
  backendConfig:
    name: llamacpp
    args:
    - -fa # use flash attention

This is all your need to do, then you can serve models with llama.cpp on Kubernetes.

Thanks!

kerthcet · 2024-08-20T02:53:29Z

kindly ping @ggerganov

Signed-off-by: kerthcet <[email protected]>

kerthcet · 2025-02-26T03:12:35Z

Hi @ggerganov let me know if this is a suitable integration for llama.cpp, actually, we use llama.cpp a lot in our platform and we use it for all kinds of tests as well for cost saving. I believe it's a great showcase about llama.cpp in cloud. Thanks!

Signed-off-by: kerthcet <[email protected]>

Add llmaz as another integration

e9d161a

Signed-off-by: kerthcet <[email protected]>

kerthcet force-pushed the document/add-llmaz branch from 7323304 to e9d161a Compare February 26, 2025 03:10

ggerganov merged commit 53e4db1 into ggml-org:master Feb 26, 2025
1 check passed

mglambda pushed a commit to mglambda/llama.cpp that referenced this pull request Mar 8, 2025

readme : update infra list (ggml-org#9096)

9fed7b4

Signed-off-by: kerthcet <[email protected]>

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Mar 19, 2025

readme : update infra list (ggml-org#9096)

89d1d41

Signed-off-by: kerthcet <[email protected]>

mostlyuseful pushed a commit to mostlyuseful/llama.cpp that referenced this pull request May 12, 2025

readme : update infra list (ggml-org#9096)

197a042

Signed-off-by: kerthcet <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add llmaz as another platform to run llama.cpp on Kubernetes #9096

Add llmaz as another platform to run llama.cpp on Kubernetes #9096

Uh oh!

kerthcet commented Aug 20, 2024 •

edited

Loading

Uh oh!

kerthcet commented Aug 20, 2024

Uh oh!

kerthcet commented Feb 26, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add llmaz as another platform to run llama.cpp on Kubernetes #9096

Add llmaz as another platform to run llama.cpp on Kubernetes #9096

Uh oh!

Conversation

kerthcet commented Aug 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kerthcet commented Aug 20, 2024

Uh oh!

kerthcet commented Feb 26, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kerthcet commented Aug 20, 2024 •

edited

Loading