-
Notifications
You must be signed in to change notification settings - Fork 13.4k
server: disable context shift #9544
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
examples/server/server.cpp
Outdated
| // context shift is disabled and prompt is too large - discard it | ||
| if (!params.ctx_shift && slot.n_prompt_tokens > slot.n_ctx ){ | ||
| slot.release(); | ||
| send_error(slot, "Input is too large to process. Either disable context shift or increase context length. ", ERROR_TYPE_SERVER); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the error say enable context shift, since it's already disabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I made the change. Thanks for the correction.
|
n_ctx divided by slots available, right ? |
slot.n_ctx for each slot is allocated by n_ctx divided by total number of slots; |
|
Accoding to your commit message it returns 200, best would be 413 IMHO, it's always better if a tool is the clearer without reading any documentation, 200 with null is just confusing |
|
@ExtReMLapin Sorry if the commit message wasn't clear. In the commit that you're referring to, it was initially returning a 200 null response but I fixed it so it returns 500 with error message "Input is too large to process. Either enable context shift or increase the context length." But now that you mention it, I think 400 is more appropriate because it matches what OpenAI uses in their API response. Just updated it to 400. |
|
Thanks for the fix, I was about to open a PR to add 413 but if it's what openAI is doing, fair enough ! |
| continue; | ||
| } | ||
| // context shift is disabled and prompt is too large - discard it | ||
| if (!params.ctx_shift && (slot.n_prompt_tokens > slot.n_ctx) ){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if this leads to correct behavior. I found that if we do:
slot.n_prompt_tokens > slot.n_ctx
Then, it's possible to fall through the check down to prompt truncation which might be confusing for the user. Maybe we should change it to:
slot.n_prompt_tokens >= slot.n_ctx
Maybe someone more knowledgeable could chime in.
|
As I'm merging #9607 , I'll close this PR. Feel free to discuss on the other PR if you want to change something. |
This enables the
--no-context-shiftargument to be passed to the server.The server will generate
n_predicttokens such thatn_predict <= n_ctx - n_tokens_prompt.If
n_tokens_prompts > n_ctx, an error will be thrown and the slot is discarded.Implements feature request #9390