Is llama.cpp batching prompts size setted by the parameter -b?

Hi there, I am also working on the performance comparison through different LLM inference framework. For the batching prompt, the llama.cpp perform quite worse then I expected.  I found that your method use the `-b` of ./llama-bench to set up the batch size. However, it's not quite clear to me that this parameter is the same as the batch_size of other framework.

[n_batch (-b) don't affect how much of the context you can use, it is just a limit to how many tokens you can put in a single batch.](https://github.com/ggerganov/llama.cpp/discussions/10260). If I understand it correctly, it means that if I set` -b` to 32 means that there is 32 token input in a single llama_decode(), other than set 32*1024 tokens to the input batch.

Here are some other related links from llama.cpp. [batch_prompt](https://github.com/ggerganov/llama.cpp/discussions/10299) , [batch-size and ubatch-size](https://github.com/ggerganov/llama.cpp/discussions/6328).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Is llama.cpp batching prompts size setted by the parameter -b? #1

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Is llama.cpp batching prompts size setted by the parameter -b? #1

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions