[llm] Add generate_from_pos API to LLM runner #11570

larryliu0820 · 2025-06-11T20:25:04Z

As titled, this API allows us to support multi-turn conversation by passing in a start_pos argument to generate_from_pos.

This pull request introduces a new feature to support text generation from a specific starting position (generate_from_pos) and includes updates to ensure proper error handling and functionality when max_new_tokens is negative. The changes primarily focus on extending the TextLLMRunner class and its associated methods to accommodate this new feature while maintaining backward compatibility.

New Feature: Text Generation from a Specific Starting Position

Added generate_from_pos Method: Introduced a new method generate_from_pos in TextLLMRunner to allow text generation starting from a specified position in the KV cache. This includes updates to the method signature, logic, and error handling. (extension/llm/runner/text_llm_runner.cpp [1] [2] [3] [4]; extension/llm/runner/text_llm_runner.h [5]
Updated Documentation: Enhanced method documentation in TextLLMRunner to describe the new functionality, including parameters like start_pos and the expected behavior. (extension/llm/runner/text_llm_runner.h [1] [2]

Error Handling Improvements

Validation for max_new_tokens: Added checks to ensure max_new_tokens is positive. If it is not, an InvalidArgument error is returned. This prevents invalid configurations during text generation. (extension/llm/runner/text_llm_runner.cpp extension/llm/runner/text_llm_runner.cppR129-R156)
Unit Test for Negative max_new_tokens: Created a new test case (GenerateFromPosErrorsWithNegativeMaxNewTokens) to verify that the generate_from_pos method correctly handles scenarios where max_new_tokens is negative. (extension/llm/runner/test/test_text_llm_runner.cpp extension/llm/runner/test/test_text_llm_runner.cppR325-R379)

As titled, this API allows us to support multi-turn conversation by passing in a `start_pos` argument to `generate_from_pos`.

pytorch-bot · 2025-06-11T20:25:07Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/11570

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 4644bb9 with merge base 72a095f ():

NEW FAILURE - The following job has failed:

pull / test-phi-3-mini-runner-linux / linux-job (gh)
RuntimeError: Command docker exec -t 1e2cac2688cf8ef37fbba8d384ca7f6a668948e88ff949acd7835395dec42602 /exec failed with exit code 1

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2025-06-11T20:25:39Z

@larryliu0820 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2025-06-11T20:48:16Z

@larryliu0820 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

This reverts commit be8ffd1.

Summary: This is a follow-up of #11570 (D76457271) We should not abort when num_prompt_tokens >= max_context_len, instead we should return error. Differential Revision: D76791781

[llm] Add generate_from_pos API to LLM runner

a1cb4c2

As titled, this API allows us to support multi-turn conversation by passing in a `start_pos` argument to `generate_from_pos`.

larryliu0820 requested review from iseeyuan, jackzhxng and swolchok as code owners June 11, 2025 20:25

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 11, 2025

larryliu0820 added the release notes: llm Changes to llm utilities label Jun 11, 2025

Add API to irunner.h

4644bb9

mergennachin approved these changes Jun 13, 2025

View reviewed changes

larryliu0820 merged commit be8ffd1 into main Jun 17, 2025
96 of 98 checks passed

larryliu0820 deleted the gen_from_pos branch June 17, 2025 04:54

larryliu0820 added a commit that referenced this pull request Jun 17, 2025

Revert "[llm] Add generate_from_pos API to LLM runner (#11570)"

045a419

This reverts commit be8ffd1.

larryliu0820 added a commit that referenced this pull request Jun 17, 2025

Fix text_llm_runner unit test

168f126

Summary: This is a follow-up of #11570 (D76457271) We should not abort when num_prompt_tokens >= max_context_len, instead we should return error. Differential Revision: D76791781

larryliu0820 mentioned this pull request Jun 17, 2025

Fix text_llm_runner unit test #11750

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[llm] Add generate_from_pos API to LLM runner #11570

[llm] Add generate_from_pos API to LLM runner #11570

Uh oh!

larryliu0820 commented Jun 11, 2025

Uh oh!

pytorch-bot bot commented Jun 11, 2025 •

edited

Loading

Uh oh!

facebook-github-bot commented Jun 11, 2025

Uh oh!

facebook-github-bot commented Jun 11, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[llm] Add generate_from_pos API to LLM runner #11570

[llm] Add generate_from_pos API to LLM runner #11570

Uh oh!

Conversation

larryliu0820 commented Jun 11, 2025

New Feature: Text Generation from a Specific Starting Position

Error Handling Improvements

Uh oh!

pytorch-bot bot commented Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/11570

❌ 1 New Failure

Uh oh!

facebook-github-bot commented Jun 11, 2025

Uh oh!

facebook-github-bot commented Jun 11, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

pytorch-bot bot commented Jun 11, 2025 •

edited

Loading