-
Notifications
You must be signed in to change notification settings - Fork 427
Description
Appreciate the functionality of variable block sparse attention, I encounter an issue about its usage.
This is my setting: seq_len = 49152, num_kv_head = 48, num_blocks_per_row = num_blocks_per_col = 768
. Because num_kv_head * num_blocks_per_row = 36864 > 32768
, the bug reports
self._kv_lens_buffer[: len(kv_lens_arr_host)].copy_( RuntimeError: The size of tensor a (32768) must match the size of tensor b (36864) at non-singleton dimension 0
It originates from that current _kv_lens_buffer
is allocated in a static way with size 32768.
flashinfer/flashinfer/sparse.py
Lines 754 to 756 in 2a61472
self._kv_lens_buffer = torch.empty( | |
(32768,), dtype=torch.int32, device=self.device | |
) |
It seems that current static allocation of _kv_lens_buffer
is not scalable for increasing kv head number and context length, can we allocate this _kv_lens_buffer
in a dynamic way in the plan
method, or just increase the pre-allocated size of _kv_lens_buffer
?