Skip to content

Commit d6c9e45

Browse files
committed
Update on "[ExecuTorch][Llama] Change runner to enable chunked prefill"
This diff adds code to chunk prompt longer than max_seq_len to enable prefill of larger context Differential Revision: [D71833061](https://our.internmc.facebook.com/intern/diff/D71833061/) [ghstack-poisoned]
1 parent 21f8a07 commit d6c9e45

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed

examples/models/llama/runner/runner.cpp

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -252,10 +252,10 @@ Error Runner::generate(
252252
std::vector<uint64_t> prompt_tokens_to_process(num_tokens_to_prefill_with);
253253
std::copy(
254254
prompt_tokens.begin() + num_tokens_to_process,
255-
prompt_tokens.begin() + num_tokens_to_process + num_tokens_to_prefill_with,
255+
prompt_tokens.begin() + num_tokens_to_process +
256+
num_tokens_to_prefill_with,
256257
prompt_tokens_to_process.begin());
257-
auto prefill_res =
258-
text_prefiller_->prefill(prompt_tokens_to_process, pos);
258+
auto prefill_res = text_prefiller_->prefill(prompt_tokens_to_process, pos);
259259
ET_CHECK_OK_OR_RETURN_ERROR(prefill_res.error());
260260
cur_token = prefill_res.get();
261261
num_tokens_to_process += num_tokens_to_prefill_with;

0 commit comments

Comments
 (0)