Introduce GenerationConfig #10228

larryliu0820 · 2025-04-16T07:40:27Z

Summary:
Started to implement #9341
Started to fix #8495

This PR introduces GenerationConfig which contains the configs that can be changed across different invocations of generate().

For example, temperature is moved out from the runner constructor for it's not tied to the runner instance but instead should be adjustable every time we call generate().

Similarly we put echo and warming into the config.

We also allow both seq_len and max_new_tokens to be passed through the config and we determine the value of max_new_tokens based on these 2 config values, pte file metadata as well as the number of prompt tokens.

Differential Revision: D73091676

pytorch-bot · 2025-04-16T07:40:31Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/10228

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit ef7d4ca with merge base f911567 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2025-04-16T07:40:39Z

This pull request was exported from Phabricator. Differential Revision: D73091676

iseeyuan · 2025-04-16T14:12:16Z

examples/models/llama/main.cpp


  if (warmup) {
-    runner.warmup(prompt, seq_len);
+    runner.warmup(prompt, /*max_new_tokens=*/seq_len);


Would it be added in the internal runner as well?

which internal runner?

extension/llm/runner/irunner.h

examples/mediatek/executor_runner/mtk_llama_runner.h

Summary: Started to implement #9341 Started to fix #8495 This PR introduces `GenerationConfig` which contains the configs that can be changed across different invocations of `generate()`. For example, `temperature` is moved out from the runner constructor for it's not tied to the runner instance but instead should be adjustable every time we call `generate()`. Similarly we put `echo` and `warming` into the config. We also allow both `seq_len` and `max_new_tokens` to be passed through the config and we determine the value of `max_new_tokens` based on these 2 config values, pte file metadata as well as the number of prompt tokens. Reviewed By: iseeyuan Differential Revision: D73091676

facebook-github-bot · 2025-04-16T19:28:11Z

This pull request was exported from Phabricator. Differential Revision: D73091676

Summary: Started to implement #9341 Started to fix #8495 This PR introduces `GenerationConfig` which contains the configs that can be changed across different invocations of `generate()`. For example, `temperature` is moved out from the runner constructor for it's not tied to the runner instance but instead should be adjustable every time we call `generate()`. Similarly we put `echo` and `warming` into the config. We also allow both `seq_len` and `max_new_tokens` to be passed through the config and we determine the value of `max_new_tokens` based on these 2 config values, pte file metadata as well as the number of prompt tokens. Reviewed By: iseeyuan Differential Revision: D73091676

facebook-github-bot · 2025-04-16T19:42:53Z

This pull request was exported from Phabricator. Differential Revision: D73091676

facebook-github-bot · 2025-04-16T19:45:47Z

This pull request was exported from Phabricator. Differential Revision: D73091676

Summary: Pull Request resolved: #10228 Started to implement #9341 Started to fix #8495 This PR introduces `GenerationConfig` which contains the configs that can be changed across different invocations of `generate()`. For example, `temperature` is moved out from the runner constructor for it's not tied to the runner instance but instead should be adjustable every time we call `generate()`. Similarly we put `echo` and `warming` into the config. We also allow both `seq_len` and `max_new_tokens` to be passed through the config and we determine the value of `max_new_tokens` based on these 2 config values, pte file metadata as well as the number of prompt tokens. Reviewed By: iseeyuan Differential Revision: D73091676

Summary: Started to implement #9341 Started to fix #8495 This PR introduces `GenerationConfig` which contains the configs that can be changed across different invocations of `generate()`. For example, `temperature` is moved out from the runner constructor for it's not tied to the runner instance but instead should be adjustable every time we call `generate()`. Similarly we put `echo` and `warming` into the config. We also allow both `seq_len` and `max_new_tokens` to be passed through the config and we determine the value of `max_new_tokens` based on these 2 config values, pte file metadata as well as the number of prompt tokens. Reviewed By: iseeyuan Differential Revision: D73091676

facebook-github-bot · 2025-04-17T04:46:43Z

This pull request was exported from Phabricator. Differential Revision: D73091676

facebook-github-bot · 2025-04-17T07:44:35Z

This pull request was exported from Phabricator. Differential Revision: D73091676

Summary: Started to implement #9341 Started to fix #8495 This PR introduces `GenerationConfig` which contains the configs that can be changed across different invocations of `generate()`. For example, `temperature` is moved out from the runner constructor for it's not tied to the runner instance but instead should be adjustable every time we call `generate()`. Similarly we put `echo` and `warming` into the config. We also allow both `seq_len` and `max_new_tokens` to be passed through the config and we determine the value of `max_new_tokens` based on these 2 config values, pte file metadata as well as the number of prompt tokens. Reviewed By: iseeyuan Differential Revision: D73091676

facebook-github-bot · 2025-04-17T07:53:30Z

This pull request was exported from Phabricator. Differential Revision: D73091676

Summary: Started to implement #9341 Started to fix #8495 This PR introduces `GenerationConfig` which contains the configs that can be changed across different invocations of `generate()`. For example, `temperature` is moved out from the runner constructor for it's not tied to the runner instance but instead should be adjustable every time we call `generate()`. Similarly we put `echo` and `warming` into the config. We also allow both `seq_len` and `max_new_tokens` to be passed through the config and we determine the value of `max_new_tokens` based on these 2 config values, pte file metadata as well as the number of prompt tokens. Reviewed By: iseeyuan Differential Revision: D73091676

facebook-github-bot · 2025-04-17T18:37:22Z

This pull request was exported from Phabricator. Differential Revision: D73091676

Summary: Started to implement #9341 Started to fix #8495 This PR introduces `GenerationConfig` which contains the configs that can be changed across different invocations of `generate()`. For example, `temperature` is moved out from the runner constructor for it's not tied to the runner instance but instead should be adjustable every time we call `generate()`. Similarly we put `echo` and `warming` into the config. We also allow both `seq_len` and `max_new_tokens` to be passed through the config and we determine the value of `max_new_tokens` based on these 2 config values, pte file metadata as well as the number of prompt tokens. Reviewed By: iseeyuan Differential Revision: D73091676

facebook-github-bot · 2025-04-17T20:02:51Z

This pull request was exported from Phabricator. Differential Revision: D73091676

Summary: Pull Request resolved: #10228 Started to implement #9341 Started to fix #8495 This PR introduces `GenerationConfig` which contains the configs that can be changed across different invocations of `generate()`. For example, `temperature` is moved out from the runner constructor for it's not tied to the runner instance but instead should be adjustable every time we call `generate()`. Similarly we put `echo` and `warming` into the config. We also allow both `seq_len` and `max_new_tokens` to be passed through the config and we determine the value of `max_new_tokens` based on these 2 config values, pte file metadata as well as the number of prompt tokens. Reviewed By: iseeyuan Differential Revision: D73091676

Summary: Started to implement #9341 Started to fix #8495 This PR introduces `GenerationConfig` which contains the configs that can be changed across different invocations of `generate()`. For example, `temperature` is moved out from the runner constructor for it's not tied to the runner instance but instead should be adjustable every time we call `generate()`. Similarly we put `echo` and `warming` into the config. We also allow both `seq_len` and `max_new_tokens` to be passed through the config and we determine the value of `max_new_tokens` based on these 2 config values, pte file metadata as well as the number of prompt tokens. Reviewed By: iseeyuan Differential Revision: D73091676

facebook-github-bot · 2025-04-17T20:56:05Z

This pull request was exported from Phabricator. Differential Revision: D73091676

Summary: Pull Request resolved: #10228 Started to implement #9341 Started to fix #8495 This PR introduces `GenerationConfig` which contains the configs that can be changed across different invocations of `generate()`. For example, `temperature` is moved out from the runner constructor for it's not tied to the runner instance but instead should be adjustable every time we call `generate()`. Similarly we put `echo` and `warming` into the config. We also allow both `seq_len` and `max_new_tokens` to be passed through the config and we determine the value of `max_new_tokens` based on these 2 config values, pte file metadata as well as the number of prompt tokens. Reviewed By: iseeyuan Differential Revision: D73091676

Summary: Started to implement #9341 Started to fix #8495 This PR introduces `GenerationConfig` which contains the configs that can be changed across different invocations of `generate()`. For example, `temperature` is moved out from the runner constructor for it's not tied to the runner instance but instead should be adjustable every time we call `generate()`. Similarly we put `echo` and `warming` into the config. We also allow both `seq_len` and `max_new_tokens` to be passed through the config and we determine the value of `max_new_tokens` based on these 2 config values, pte file metadata as well as the number of prompt tokens. Reviewed By: iseeyuan Differential Revision: D73091676

facebook-github-bot · 2025-04-17T21:04:28Z

This pull request was exported from Phabricator. Differential Revision: D73091676

Summary: Started to implement #9341 Started to fix #8495 This PR introduces `GenerationConfig` which contains the configs that can be changed across different invocations of `generate()`. For example, `temperature` is moved out from the runner constructor for it's not tied to the runner instance but instead should be adjustable every time we call `generate()`. Similarly we put `echo` and `warming` into the config. We also allow both `seq_len` and `max_new_tokens` to be passed through the config and we determine the value of `max_new_tokens` based on these 2 config values, pte file metadata as well as the number of prompt tokens. Reviewed By: iseeyuan Differential Revision: D73091676

facebook-github-bot · 2025-04-17T21:22:17Z

This pull request was exported from Phabricator. Differential Revision: D73091676

Summary: Started to implement #9341 Started to fix #8495 This PR introduces `GenerationConfig` which contains the configs that can be changed across different invocations of `generate()`. For example, `temperature` is moved out from the runner constructor for it's not tied to the runner instance but instead should be adjustable every time we call `generate()`. Similarly we put `echo` and `warming` into the config. We also allow both `seq_len` and `max_new_tokens` to be passed through the config and we determine the value of `max_new_tokens` based on these 2 config values, pte file metadata as well as the number of prompt tokens. Reviewed By: iseeyuan Differential Revision: D73091676

facebook-github-bot · 2025-04-17T22:16:17Z

This pull request was exported from Phabricator. Differential Revision: D73091676

Differential Revision: D73091676 Pull Request resolved: pytorch#10228

larryliu0820 requested review from cccclai, iseeyuan, jackzhxng, kirklandsign, lucylq, shoumikhin and swolchok as code owners April 16, 2025 07:40

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 16, 2025

facebook-github-bot added the fb-exported label Apr 16, 2025

iseeyuan reviewed Apr 16, 2025

View reviewed changes

extension/llm/runner/irunner.h Outdated Show resolved Hide resolved

iseeyuan approved these changes Apr 16, 2025

View reviewed changes

cccclai reviewed Apr 16, 2025

View reviewed changes

examples/mediatek/executor_runner/mtk_llama_runner.h Show resolved Hide resolved

facebook-github-bot force-pushed the export-D73091676 branch from 707bda8 to 74cfb7f Compare April 16, 2025 19:27

larryliu0820 added the release notes: api Changes to public facing apis (any interfaces, pybinded runtime methods, etc.) label Apr 16, 2025

larryliu0820 force-pushed the export-D73091676 branch from 74cfb7f to 5ecf7b7 Compare April 16, 2025 19:42

facebook-github-bot force-pushed the export-D73091676 branch from 5ecf7b7 to 834fac2 Compare April 16, 2025 19:42

larryliu0820 force-pushed the export-D73091676 branch from 834fac2 to 72cbdf1 Compare April 16, 2025 19:45

larryliu0820 force-pushed the export-D73091676 branch from 72cbdf1 to 605ff4d Compare April 17, 2025 04:43

facebook-github-bot force-pushed the export-D73091676 branch from 2fc8d51 to 163ccea Compare April 17, 2025 07:53

facebook-github-bot requested a review from tarun292 as a code owner April 17, 2025 07:53

facebook-github-bot force-pushed the export-D73091676 branch from 163ccea to e89ba89 Compare April 17, 2025 18:37

larryliu0820 force-pushed the export-D73091676 branch from e89ba89 to 0357334 Compare April 17, 2025 19:59

larryliu0820 force-pushed the export-D73091676 branch from 0357334 to febbfa6 Compare April 17, 2025 20:02

larryliu0820 force-pushed the export-D73091676 branch from febbfa6 to 4009fda Compare April 17, 2025 20:52

larryliu0820 force-pushed the export-D73091676 branch from 4009fda to 9fe9659 Compare April 17, 2025 20:56

facebook-github-bot force-pushed the export-D73091676 branch from 9fe9659 to 4e038ea Compare April 17, 2025 21:04

facebook-github-bot force-pushed the export-D73091676 branch from 4e038ea to 6004124 Compare April 17, 2025 21:22

facebook-github-bot force-pushed the export-D73091676 branch from 6004124 to ef7d4ca Compare April 17, 2025 22:16

facebook-github-bot merged commit 08c07fa into main Apr 18, 2025
95 of 97 checks passed

facebook-github-bot deleted the export-D73091676 branch April 18, 2025 01:55

keyprocedure pushed a commit to keyprocedure/executorch that referenced this pull request Apr 21, 2025

Introduce GenerationConfig

3165aeb

Differential Revision: D73091676 Pull Request resolved: pytorch#10228

Introduce GenerationConfig #10228

Introduce GenerationConfig #10228

Uh oh!

Conversation

larryliu0820 commented Apr 16, 2025

Uh oh!

pytorch-bot bot commented Apr 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/10228

✅ No Failures

Uh oh!

facebook-github-bot commented Apr 16, 2025

Uh oh!

iseeyuan Apr 16, 2025

Choose a reason for hiding this comment

Uh oh!

larryliu0820 Apr 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

facebook-github-bot commented Apr 16, 2025

Uh oh!

facebook-github-bot commented Apr 16, 2025

Uh oh!

facebook-github-bot commented Apr 16, 2025

Uh oh!

facebook-github-bot commented Apr 17, 2025

Uh oh!

facebook-github-bot commented Apr 17, 2025

Uh oh!

facebook-github-bot commented Apr 17, 2025

Uh oh!

facebook-github-bot commented Apr 17, 2025

Uh oh!

facebook-github-bot commented Apr 17, 2025

Uh oh!

facebook-github-bot commented Apr 17, 2025

Uh oh!

facebook-github-bot commented Apr 17, 2025

Uh oh!

facebook-github-bot commented Apr 17, 2025

Uh oh!

facebook-github-bot commented Apr 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

pytorch-bot bot commented Apr 16, 2025 •

edited

Loading