Skip to content

Conversation

@kylo5aby
Copy link
Contributor

maybe fix: #9933

Comment on lines 246 to 247
} else if (global_params.n_predict == -2) {
n_remaining = global_params.n_ctx - n_decoded;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not precise. It should use the slot's context. And instead of n_decoded it has to use n_past. Writing a server test to verify the implementation would be useful.

@aviallon
Copy link
Contributor

aviallon commented Apr 6, 2025

I'm interested in this PR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: Unexpected output length (Only one token response!) when set configs "-n -2 -c 256" for llama-server

3 participants