Batch-invariant chunked prefill with FlashInfer Backend

Hi,

I would like to better understand batch-invariant chunked prefill when using the FlashInfer backend. The SGLang blog mentions that "the chunking logic aligns the truncation point with an integer multiple of `split_kv_size`." Could you explain why the truncation point must align with a multiple of `split_kv_size`, and why the choice of truncation point is important in this context? Additionally, would it be sufficient to disable KV splitting and use `BatchedKVCachePaged` to avoid this constraint?

More broadly, would fixing `query_length` to a particular value, for example 100, be sufficient? If not, why?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch-invariant chunked prefill with FlashInfer Backend #2270

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Batch-invariant chunked prefill with FlashInfer Backend #2270

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions