Conversation
There was a problem hiding this comment.
Hello @blueswhen, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
Summary of Changes
Hello team, gemini-code-assist here to provide a summary of this pull request. Based on the title "fix: reset tensor with zeros" and the code changes, the primary goal of this PR is to fix an issue by ensuring that the kv_indices tensor is initialized with zero values instead of potentially containing arbitrary data from memory (which torch.empty does). This change is applied in the flashinfer_struct.py files for both the DeepSeek2 and Llama models, specifically targeting the initialization of the kv_indices tensor.
Highlights
- Tensor Initialization Fix: The pull request changes the initialization method for the
kv_indicestensor fromtorch.emptytotorch.zeros. This ensures the tensor is filled with zeros upon creation, likely fixing a bug where uninitialized values were causing issues. - Tensor Dimension Adjustment: Along with changing the initialization function, the size calculation for the
kv_indicestensor is also adjusted in the affected initialization paths.
Changelog
- lightllm/models/deepseek2/flashinfer_struct.py
- Line 31: Changed the initialization of
self.kv_indicesfromtorch.emptytotorch.zerosand updated the tensor's size calculation.
- Line 31: Changed the initialization of
- lightllm/models/llama/flashinfer_struct.py
- Line 31: Changed the initialization of
self.kv_indicesfromtorch.emptytotorch.zerosand updated the tensor's size calculation. - Line 71: Changed the initialization of a local
kv_indicesvariable fromtorch.emptytotorch.zerosand updated the tensor's size calculation.
- Line 31: Changed the initialization of
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Code Review
This pull request aims to initialize tensors with zeros instead of using torch.empty, which is a good practice to avoid using uninitialized memory. However, the changes also inadvertently reduce the size of the kv_indices tensor in three locations. This size reduction appears to be a bug and could lead to out-of-bounds memory access, as the new size might be insufficient for the data packed by repack_kv_index.
I recommend initializing with torch.zeros but retaining the original tensor sizing logic (i.e., self.batch_size * self.flashinfer_extra_state.max_seq_length).
Summary of Findings
- Critical: Tensor Size Reduction: The pull request changes
torch.emptytotorch.zerosfor tensor initialization, which is good. However, it also significantly reduces the allocated size forkv_indicesin three places. This reduced size (frombatch_size * max_seq_lengthtomax_seq_length) is likely insufficient for storing packed token indices for a batch, potentially leading to out-of-bounds memory access and crashes. The original tensor size should be maintained. - Correctness: Initialization with Zeros: The change from
torch.emptytotorch.zerosis a positive step towards ensuring tensors are initialized, preventing potential issues from uninitialized memory. This aspect of the PR is good, assuming the tensor sizing issue is resolved.
Merge Readiness
This pull request addresses a potential issue by initializing tensors with zeros. However, it appears to introduce a critical bug by incorrectly reducing the size of these tensors. The identified sizing issues must be addressed before this PR can be considered for merging. I am unable to approve pull requests, but I recommend that these critical issues are fixed and the code is re-reviewed. After the suggested changes, the core idea of using torch.zeros would be a good improvement.
| self.kv_indices = torch.zeros(self.flashinfer_extra_state.max_seq_length, dtype=torch.int32).to( | ||
| input_ids.device | ||
| ) |
There was a problem hiding this comment.
The change from torch.empty to torch.zeros is good for safety. However, the size of self.kv_indices has also changed from self.batch_size * self.flashinfer_extra_state.max_seq_length to self.flashinfer_extra_state.max_seq_length.
Is this size reduction intentional? The repack_kv_index function populates self.kv_indices with packed token indices from the batch. The total number of tokens can be up to self.batch_size * self.max_len_in_batch. If self.max_len_in_batch is close to self.flashinfer_extra_state.max_seq_length, the new size self.flashinfer_extra_state.max_seq_length would be too small if self.batch_size > 1, potentially leading to an out-of-bounds write.
The previous sizing self.batch_size * self.flashinfer_extra_state.max_seq_length seems more appropriate and is consistent with the buffer sizing used when self.batch_size <= model.graph_max_batch_size (lines 27-28).
Could we use torch.zeros but keep the original size calculation?
self.kv_indices = torch.zeros(\n self.batch_size * self.flashinfer_extra_state.max_seq_length, dtype=torch.int32\n ).to(input_ids.device)| self.kv_indices = torch.zeros(self.flashinfer_extra_state.max_seq_length, dtype=torch.int32).to( | ||
| input_ids.device | ||
| ) |
There was a problem hiding this comment.
Similar to the change in deepseek2/flashinfer_struct.py, this modification changes self.kv_indices initialization from torch.empty to torch.zeros (which is good) but also reduces its size from self.batch_size * self.flashinfer_extra_state.max_seq_length to self.flashinfer_extra_state.max_seq_length.
This size reduction could lead to out-of-bounds writes in repack_kv_index if the total number of tokens in the batch exceeds self.flashinfer_extra_state.max_seq_length, which is likely if self.batch_size > 1.
The original size seems more appropriate, consistent with the kv_indices_buffer usage in the if self.batch_size <= model.graph_max_batch_size: block (lines 27-28).
Could you clarify if this size change is intended? If not, it might be better to use torch.zeros with the original size.
self.kv_indices = torch.zeros(\n self.batch_size * self.flashinfer_extra_state.max_seq_length, dtype=torch.int32\n ).to(input_ids.device)| kv_indices = torch.zeros(self.flashinfer_extra_state.max_seq_length, dtype=torch.int32).to( | ||
| input_ids.device | ||
| ) |
There was a problem hiding this comment.
In this prefill section, the local variable kv_indices is initialized. The change from torch.empty to torch.zeros is positive. However, its size has also been reduced from self.batch_size * self.flashinfer_extra_state.max_seq_length to self.flashinfer_extra_state.max_seq_length.
This kv_indices tensor is populated by repack_kv_index and used by BatchPrefillWithPagedKVCacheWrapper. The required size depends on the total number of tokens in the prefill batch, which can be up to self.batch_size * self.max_len_in_batch (where max_len_in_batch refers to max_q_seq_len in prefill).
If self.batch_size > 1, the new size self.flashinfer_extra_state.max_seq_length is likely insufficient, risking an out-of-bounds write. The original sizing seems more robust.
Was this size reduction intended? Consider using torch.zeros with the original size calculation.
kv_indices = torch.zeros(\n self.batch_size * self.flashinfer_extra_state.max_seq_length, dtype=torch.int32\n ).to(input_ids.device)
No description provided.