Skip to content

Conversation

@VJHack
Copy link
Contributor

@VJHack VJHack commented Sep 19, 2024

This enables the --no-context-shift argument to be passed to the server.
The server will generate n_predict tokens such that n_predict <= n_ctx - n_tokens_prompt.
If n_tokens_prompts > n_ctx, an error will be thrown and the slot is discarded.

Implements feature request #9390

@VJHack VJHack changed the title allow disable context shift for sever server: disable context shift Sep 19, 2024
@VJHack VJHack marked this pull request as draft September 19, 2024 02:06
// context shift is disabled and prompt is too large - discard it
if (!params.ctx_shift && slot.n_prompt_tokens > slot.n_ctx ){
slot.release();
send_error(slot, "Input is too large to process. Either disable context shift or increase context length. ", ERROR_TYPE_SERVER);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the error say enable context shift, since it's already disabled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I made the change. Thanks for the correction.

@ExtReMLapin
Copy link
Contributor

n_ctx divided by slots available, right ?

@VJHack VJHack marked this pull request as ready for review September 20, 2024 19:58
@VJHack VJHack requested a review from eskeletor97 September 20, 2024 20:11
@VJHack
Copy link
Contributor Author

VJHack commented Sep 20, 2024

n_ctx divided by slots available, right ?

slot.n_ctx for each slot is allocated by n_ctx divided by total number of slots;

@ExtReMLapin
Copy link
Contributor

ExtReMLapin commented Sep 21, 2024

Accoding to your commit message it returns 200, best would be 413

IMHO, it's always better if a tool is the clearer without reading any documentation, 200 with null is just confusing

@VJHack
Copy link
Contributor Author

VJHack commented Sep 21, 2024

@ExtReMLapin Sorry if the commit message wasn't clear. In the commit that you're referring to, it was initially returning a 200 null response but I fixed it so it returns 500 with error message "Input is too large to process. Either enable context shift or increase the context length."

But now that you mention it, I think 400 is more appropriate because it matches what OpenAI uses in their API response. Just updated it to 400.

@ExtReMLapin
Copy link
Contributor

Thanks for the fix, I was about to open a PR to add 413 but if it's what openAI is doing, fair enough !

continue;
}
// context shift is disabled and prompt is too large - discard it
if (!params.ctx_shift && (slot.n_prompt_tokens > slot.n_ctx) ){

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if this leads to correct behavior. I found that if we do:

slot.n_prompt_tokens > slot.n_ctx

Then, it's possible to fall through the check down to prompt truncation which might be confusing for the user. Maybe we should change it to:

slot.n_prompt_tokens >= slot.n_ctx

Maybe someone more knowledgeable could chime in.

@ngxson
Copy link
Collaborator

ngxson commented Sep 23, 2024

As I'm merging #9607 , I'll close this PR. Feel free to discuss on the other PR if you want to change something.

@ngxson ngxson closed this Sep 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants