Skip to content

Batch-invariant chunked prefill with FlashInfer Backend #2270

@rajagond

Description

@rajagond

Hi,

I would like to better understand batch-invariant chunked prefill when using the FlashInfer backend. The SGLang blog mentions that "the chunking logic aligns the truncation point with an integer multiple of split_kv_size." Could you explain why the truncation point must align with a multiple of split_kv_size, and why the choice of truncation point is important in this context? Additionally, would it be sufficient to disable KV splitting and use BatchedKVCachePaged to avoid this constraint?

More broadly, would fixing query_length to a particular value, for example 100, be sufficient? If not, why?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions