[llm] Add generate_from_pos API to LLM runner #11570
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
As titled, this API allows us to support multi-turn conversation by passing in a
start_posargument togenerate_from_pos.This pull request introduces a new feature to support text generation from a specific starting position (
generate_from_pos) and includes updates to ensure proper error handling and functionality whenmax_new_tokensis negative. The changes primarily focus on extending theTextLLMRunnerclass and its associated methods to accommodate this new feature while maintaining backward compatibility.New Feature: Text Generation from a Specific Starting Position
Added
generate_from_posMethod: Introduced a new methodgenerate_from_posinTextLLMRunnerto allow text generation starting from a specified position in the KV cache. This includes updates to the method signature, logic, and error handling. (extension/llm/runner/text_llm_runner.cpp[1] [2] [3] [4];extension/llm/runner/text_llm_runner.h[5]Updated Documentation: Enhanced method documentation in
TextLLMRunnerto describe the new functionality, including parameters likestart_posand the expected behavior. (extension/llm/runner/text_llm_runner.h[1] [2]Error Handling Improvements
Validation for
max_new_tokens: Added checks to ensuremax_new_tokensis positive. If it is not, anInvalidArgumenterror is returned. This prevents invalid configurations during text generation. (extension/llm/runner/text_llm_runner.cppextension/llm/runner/text_llm_runner.cppR129-R156)Unit Test for Negative
max_new_tokens: Created a new test case (GenerateFromPosErrorsWithNegativeMaxNewTokens) to verify that thegenerate_from_posmethod correctly handles scenarios wheremax_new_tokensis negative. (extension/llm/runner/test/test_text_llm_runner.cppextension/llm/runner/test/test_text_llm_runner.cppR325-R379)