- 
                Notifications
    You must be signed in to change notification settings 
- Fork 706
[Executorch][llama] Change runner to decouple prompt length from sequence length #9350
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ence length Following previous diff now we can utilize entire kv cache to generate more tokens than max prompt length allowed. Differential Revision: [D69073908](https://our.internmc.facebook.com/intern/diff/D69073908/) [ghstack-poisoned]
| 🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/9350
 Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ❌ 2 New FailuresAs of commit d8a3dce with merge base 6daff83 ( NEW FAILURES - The following jobs have failed:
 
 This comment was automatically generated by Dr. CI and updates every 15 minutes. | 
| This pull request was exported from Phabricator. Differential Revision: D69073908 | 
| This PR needs a  | 
…h from sequence length" length Following previous diff now we can utilize entire kv cache to generate more tokens than max prompt length allowed. Differential Revision: [D69073908](https://our.internmc.facebook.com/intern/diff/D69073908/) [ghstack-poisoned]
…ence length Pull Request resolved: #9350 Following previous diff now we can utilize entire kv cache to generate more tokens than max prompt length allowed. ghstack-source-id: 272776941 @exported-using-ghexport Differential Revision: [D69073908](https://our.internmc.facebook.com/intern/diff/D69073908/)
| This pull request was exported from Phabricator. Differential Revision: D69073908 | 
…ence length Following previous diff now we can utilize entire kv cache to generate more tokens than max prompt length allowed. Differential Revision: [D69073908](https://our.internmc.facebook.com/intern/diff/D69073908/) ghstack-source-id: 272375855 Pull Request resolved: pytorch/executorch#9350
| Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as  | 
| Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as  | 
Stack from ghstack (oldest at bottom):
length
Following previous diff now we can utilize entire kv cache to generate more
tokens than max prompt length allowed.
Differential Revision: D69073908