Skip to content

Conversation

@ggerganov
Copy link
Member

ref #9949 (comment)

Fix batch size initialization to use the actual context of the model. Resolves issues when params.n_ctx == 0 (i.e. uses the training size of the model context).

@ggerganov ggerganov requested a review from slaren October 20, 2024 16:41
@slaren
Copy link
Member

slaren commented Oct 20, 2024

I think this should fix the immediate issue, but there may be other issues. The batch should never exceed n_batch, which is always equal or smaller than n_ctx, so it should probably be initialized to n_batch instead, but that may indicate that the speculative example may create batches too big.
I also noticed that without -n, it would only generate one token, which probably indicates that it does not support the default n_predict of -1.

@ggerganov
Copy link
Member Author

Indeed the example will currently assert with very large prompts that exceed the specified batch size: src/llama.cpp:17136: GGML_ASSERT(n_tokens_all <= cparams.n_batch) failed. llama-cli submits the prompt in a loop of batches so it does not have this problem.

We can leave it as it is though, since llama-speculative mainly focuses on the speculative functionality and don't think it is worth adding the extra logic for chunking the prompt into batches.

@ggerganov ggerganov merged commit bc21975 into master Oct 21, 2024
54 checks passed
@ggerganov ggerganov deleted the gg/speculative-fixes branch October 21, 2024 06:37
dsx1986 pushed a commit to dsx1986/llama.cpp that referenced this pull request Oct 29, 2024
* speculative : fix batch sizes at initialization

ggml-ci

* speculative : handle params.n_predict == -1

* speculative : limit batch size to llama_n_batch
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 15, 2024
* speculative : fix batch sizes at initialization

ggml-ci

* speculative : handle params.n_predict == -1

* speculative : limit batch size to llama_n_batch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants