Skip to content

Commit ae93078

Browse files
committed
Update on "[ExecuTorch][Llama] Change runner to enable chunked prefill"
This diff adds code to chunk prompt longer than max_seq_len to enable prefill of larger context Differential Revision: [D71833061](https://our.internmc.facebook.com/intern/diff/D71833061/) [ghstack-poisoned]
1 parent b5dfef9 commit ae93078

File tree

1 file changed

+2
-1
lines changed

1 file changed

+2
-1
lines changed

examples/models/llava/runner/llava_runner.cpp

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,8 @@ Error LlavaRunner::load() {
5555
text_prefiller_ = std::make_unique<llm::TextPrefiller>(
5656
text_decoder_runner_.get(),
5757
/*use_kv_cache=*/true,
58-
/*enable_parallel_prefill=*/true);
58+
/*enable_parallel_prefill=*/true,
59+
/*max_seq_len=*/128);
5960

6061
// Load the image prefiller
6162
image_prefiller_ = std::make_unique<LlavaImagePrefiller>(module_.get());

0 commit comments

Comments
 (0)