fix: reset tensor with zeros by blueswhen · Pull Request #920 · ModelTC/LightLLM

blueswhen · 2025-06-06T05:52:03Z

No description provided.

gemini-code-assist

Hello @blueswhen, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

Hello team, gemini-code-assist here to provide a summary of this pull request. Based on the title "fix: reset tensor with zeros" and the code changes, the primary goal of this PR is to fix an issue by ensuring that the kv_indices tensor is initialized with zero values instead of potentially containing arbitrary data from memory (which torch.empty does). This change is applied in the flashinfer_struct.py files for both the DeepSeek2 and Llama models, specifically targeting the initialization of the kv_indices tensor.

Highlights

Tensor Initialization Fix: The pull request changes the initialization method for the kv_indices tensor from torch.empty to torch.zeros. This ensures the tensor is filled with zeros upon creation, likely fixing a bug where uninitialized values were causing issues.
Tensor Dimension Adjustment: Along with changing the initialization function, the size calculation for the kv_indices tensor is also adjusted in the affected initialization paths.

Changelog

lightllm/models/deepseek2/flashinfer_struct.py
- Line 31: Changed the initialization of self.kv_indices from torch.empty to torch.zeros and updated the tensor's size calculation.
lightllm/models/llama/flashinfer_struct.py
- Line 31: Changed the initialization of self.kv_indices from torch.empty to torch.zeros and updated the tensor's size calculation.
- Line 71: Changed the initialization of a local kv_indices variable from torch.empty to torch.zeros and updated the tensor's size calculation.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request aims to initialize tensors with zeros instead of using torch.empty, which is a good practice to avoid using uninitialized memory. However, the changes also inadvertently reduce the size of the kv_indices tensor in three locations. This size reduction appears to be a bug and could lead to out-of-bounds memory access, as the new size might be insufficient for the data packed by repack_kv_index.

I recommend initializing with torch.zeros but retaining the original tensor sizing logic (i.e., self.batch_size * self.flashinfer_extra_state.max_seq_length).

Summary of Findings

Critical: Tensor Size Reduction: The pull request changes torch.empty to torch.zeros for tensor initialization, which is good. However, it also significantly reduces the allocated size for kv_indices in three places. This reduced size (from batch_size * max_seq_length to max_seq_length) is likely insufficient for storing packed token indices for a batch, potentially leading to out-of-bounds memory access and crashes. The original tensor size should be maintained.
Correctness: Initialization with Zeros: The change from torch.empty to torch.zeros is a positive step towards ensuring tensors are initialized, preventing potential issues from uninitialized memory. This aspect of the PR is good, assuming the tensor sizing issue is resolved.

Merge Readiness

This pull request addresses a potential issue by initializing tensors with zeros. However, it appears to introduce a critical bug by incorrectly reducing the size of these tensors. The identified sizing issues must be addressed before this PR can be considered for merging. I am unable to approve pull requests, but I recommend that these critical issues are fixed and the code is re-reviewed. After the suggested changes, the core idea of using torch.zeros would be a good improvement.

gemini-code-assist · 2025-06-06T05:54:09Z

lightllm/models/deepseek2/flashinfer_struct.py

+                    self.kv_indices = torch.zeros(self.flashinfer_extra_state.max_seq_length, dtype=torch.int32).to(
+                        input_ids.device
+                    )


The change from torch.empty to torch.zeros is good for safety. However, the size of self.kv_indices has also changed from self.batch_size * self.flashinfer_extra_state.max_seq_length to self.flashinfer_extra_state.max_seq_length.

Is this size reduction intentional? The repack_kv_index function populates self.kv_indices with packed token indices from the batch. The total number of tokens can be up to self.batch_size * self.max_len_in_batch. If self.max_len_in_batch is close to self.flashinfer_extra_state.max_seq_length, the new size self.flashinfer_extra_state.max_seq_length would be too small if self.batch_size > 1, potentially leading to an out-of-bounds write.

The previous sizing self.batch_size * self.flashinfer_extra_state.max_seq_length seems more appropriate and is consistent with the buffer sizing used when self.batch_size <= model.graph_max_batch_size (lines 27-28).

Could we use torch.zeros but keep the original size calculation?

self.kv_indices = torch.zeros(\n self.batch_size * self.flashinfer_extra_state.max_seq_length, dtype=torch.int32\n ).to(input_ids.device)

gemini-code-assist · 2025-06-06T05:54:10Z

lightllm/models/llama/flashinfer_struct.py

+                    self.kv_indices = torch.zeros(self.flashinfer_extra_state.max_seq_length, dtype=torch.int32).to(
+                        input_ids.device
+                    )


Similar to the change in deepseek2/flashinfer_struct.py, this modification changes self.kv_indices initialization from torch.empty to torch.zeros (which is good) but also reduces its size from self.batch_size * self.flashinfer_extra_state.max_seq_length to self.flashinfer_extra_state.max_seq_length.

This size reduction could lead to out-of-bounds writes in repack_kv_index if the total number of tokens in the batch exceeds self.flashinfer_extra_state.max_seq_length, which is likely if self.batch_size > 1.

The original size seems more appropriate, consistent with the kv_indices_buffer usage in the if self.batch_size <= model.graph_max_batch_size: block (lines 27-28).

Could you clarify if this size change is intended? If not, it might be better to use torch.zeros with the original size.

self.kv_indices = torch.zeros(\n self.batch_size * self.flashinfer_extra_state.max_seq_length, dtype=torch.int32\n ).to(input_ids.device)

gemini-code-assist · 2025-06-06T05:54:10Z

lightllm/models/llama/flashinfer_struct.py

+                kv_indices = torch.zeros(self.flashinfer_extra_state.max_seq_length, dtype=torch.int32).to(
+                    input_ids.device
+                )


In this prefill section, the local variable kv_indices is initialized. The change from torch.empty to torch.zeros is positive. However, its size has also been reduced from self.batch_size * self.flashinfer_extra_state.max_seq_length to self.flashinfer_extra_state.max_seq_length.

This kv_indices tensor is populated by repack_kv_index and used by BatchPrefillWithPagedKVCacheWrapper. The required size depends on the total number of tokens in the prefill batch, which can be up to self.batch_size * self.max_len_in_batch (where max_len_in_batch refers to max_q_seq_len in prefill).

If self.batch_size > 1, the new size self.flashinfer_extra_state.max_seq_length is likely insufficient, risking an out-of-bounds write. The original sizing seems more robust.

Was this size reduction intended? Consider using torch.zeros with the original size calculation.

kv_indices = torch.zeros(\n self.batch_size * self.flashinfer_extra_state.max_seq_length, dtype=torch.int32\n ).to(input_ids.device)

gemini-code-assist bot reviewed Jun 6, 2025

View reviewed changes

fix: reset tensor with zeros

a48f875

gemini-code-assist bot suggested changes Jun 6, 2025

View reviewed changes

shihaobai merged commit 238f459 into main Jun 6, 2025
1 check passed

shihaobai deleted the fix_flashinfer branch June 6, 2025 05:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: reset tensor with zeros#920

fix: reset tensor with zeros#920
shihaobai merged 1 commit intomainfrom
fix_flashinfer

blueswhen commented Jun 6, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jun 6, 2025

Uh oh!

gemini-code-assist bot Jun 6, 2025

Uh oh!

gemini-code-assist bot Jun 6, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

blueswhen commented Jun 6, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Changelog

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Summary of Findings

Merge Readiness

Uh oh!

gemini-code-assist bot Jun 6, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jun 6, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jun 6, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants