-
-
Notifications
You must be signed in to change notification settings - Fork 11k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Your current environment
The output of python collect_env.py
Your output of `python collect_env.py` here
🐛 Describe the bug
While performing static analysis on CUDA kernels, I identified potential out-of-bounds accesses in the gather_and_maybe_dequant_cache kernel in cache_kernels.cu.
1. batch_block_table[pid]
Lines 968 to 969 in a00d625
| for (int pid = split_start; pid < full_blocks_end; ++pid) { | |
| auto block_id = batch_block_table[pid]; |
batch_block_table[pid] may lead to an out-of-bounds access.
block_table has shape: [1, u0]
index = batch_offset + offset + pid
= blockIdx.x * block_table_stride + seq_starts[bid] / block_size + pid
= blockIdx.x * u0 + seq_starts[blockIdx.x] / 64 + pid
Example Scenario
batch_block_table.shape: [1, 2]
blockIdx.x: 0
blockIdx.y: 0
seq_starts[0]: 128
pid: 0
Under these conditions, batch_block_table[pid] accesses invalid memory due to an out-of-bounds index.
2. batch_block_table[full_blocks_end]
Lines 978 to 979 in a00d625
| if (partial_block_size) { | |
| auto block_id = batch_block_table[full_blocks_end]; |
Similarly, batch_block_table[full_blocks_end] may also lead to an out-of-bounds access.
Example Scenario
batch_block_table.shape: [1, 2]
blockIdx.x: 0
blockIdx.y: 0
seq_starts[0]: 128
cu_seq_lens[1]: 1
cu_seq_lens[0]: 0
pid: 0
In this case, batch_block_table[full_blocks_end] also accesses invalid memory.
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working