-
Notifications
You must be signed in to change notification settings - Fork 725
Open
Labels
Description
Hi,
I would like to better understand batch-invariant chunked prefill when using the FlashInfer backend. The SGLang blog mentions that "the chunking logic aligns the truncation point with an integer multiple of split_kv_size." Could you explain why the truncation point must align with a multiple of split_kv_size, and why the choice of truncation point is important in this context? Additionally, would it be sufficient to disable KV splitting and use BatchedKVCachePaged to avoid this constraint?
More broadly, would fixing query_length to a particular value, for example 100, be sufficient? If not, why?
Reactions are currently unavailable