speculative : fix batch sizes at initialization #9963

ggerganov · 2024-10-20T16:40:44Z

Fix batch size initialization to use the actual context of the model. Resolves issues when params.n_ctx == 0 (i.e. uses the training size of the model context).

ggml-ci

slaren · 2024-10-20T16:48:35Z

I think this should fix the immediate issue, but there may be other issues. The batch should never exceed n_batch, which is always equal or smaller than n_ctx, so it should probably be initialized to n_batch instead, but that may indicate that the speculative example may create batches too big.
I also noticed that without -n, it would only generate one token, which probably indicates that it does not support the default n_predict of -1.

ggerganov · 2024-10-20T17:20:47Z

Indeed the example will currently assert with very large prompts that exceed the specified batch size: src/llama.cpp:17136: GGML_ASSERT(n_tokens_all <= cparams.n_batch) failed. llama-cli submits the prompt in a loop of batches so it does not have this problem.

We can leave it as it is though, since llama-speculative mainly focuses on the speculative functionality and don't think it is worth adding the extra logic for chunking the prompt into batches.

* speculative : fix batch sizes at initialization ggml-ci * speculative : handle params.n_predict == -1 * speculative : limit batch size to llama_n_batch

speculative : fix batch sizes at initialization

47bb241

ggml-ci

github-actions bot added the examples label Oct 20, 2024

ggerganov requested a review from slaren October 20, 2024 16:41

slaren approved these changes Oct 20, 2024

View reviewed changes

ggerganov added 2 commits October 20, 2024 20:10

speculative : handle params.n_predict == -1

67d1849

speculative : limit batch size to llama_n_batch

90ab8a1

ggerganov merged commit bc21975 into master Oct 21, 2024
54 checks passed

ggerganov deleted the gg/speculative-fixes branch October 21, 2024 06:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

speculative : fix batch sizes at initialization #9963

speculative : fix batch sizes at initialization #9963

Uh oh!

ggerganov commented Oct 20, 2024

Uh oh!

slaren commented Oct 20, 2024 •

edited

Loading

Uh oh!

ggerganov commented Oct 20, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

speculative : fix batch sizes at initialization #9963

speculative : fix batch sizes at initialization #9963

Uh oh!

Conversation

ggerganov commented Oct 20, 2024

Uh oh!

slaren commented Oct 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ggerganov commented Oct 20, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

slaren commented Oct 20, 2024 •

edited

Loading