dycoke #388

chengtao-lv · 2025-06-06T06:37:57Z

No description provided.

gemini-code-assist

Hello @chengtao-lv, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

Hello! Gemini here, providing a summary of this pull request. This PR introduces a new token reduction method called 'DyCoke' specifically designed for vision models, integrated into the Llava OneVision architecture. The core idea appears to be reducing the number of visual tokens by merging similar ones across frames based on cosine similarity. This involves adding the DyCoke implementation, registering it within the token reduction framework, and updating the Llava OneVision model to correctly identify its vision blocks for applying this method. A new configuration file is also added to define the parameters for the DyCoke method.

Highlights

New Token Reduction Method: Adds the implementation for a new token reduction method named 'DyCoke', aimed at reducing visual tokens in models like Llava OneVision.
Vision Model Integration: Modifies the Llava OneVision model to correctly identify its vision encoder layers, enabling the application of token reduction methods to the visual modality.
Configuration: Includes a new YAML configuration file (dycoke.yml) to specify parameters for the DyCoke token reduction method.
Token Merging Logic: Implements a function (dycole_ttm) that calculates cosine similarity between tokens in adjacent frames and merges tokens based on a defined merging ratio.

Changelog

Click here to see the changelog

configs/sparsification/methods/DyCoke/dycoke.yml
- Added a new configuration file for the DyCoke token reduction method.
- Defines parameters such as dycoke_layer_idx, num_tokens_per_frame, merging_ratio, and dycoke_radio.
llmc/compression/token_reduction/init.py
- Added an import statement to include the new DyCoke module, making it available in the token_reduction package.
llmc/compression/token_reduction/dycoke.py
- Added the core implementation of the DyCoke token reduction method.
- Includes the dycole_ttm function for token merging based on similarity.
- Registers DyCoke with the TOKEN_REDUCTION_REGISTRY.
- Adds a hook (add_dycole_ttm_to_get_2dPool) to integrate the merging logic into the model's processing pipeline, specifically for LlavaMetaForCausalLM.
llmc/models/llava_onevision.py
- Added the find_blocks method to the LLaVA_OV class.
- This method now correctly identifies the vision encoder layers (self.vision_model.vision_tower.vision_model.encoder.layers) when the modality is 'vision', in addition to handling the 'language' modality.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces the DyCoke token reduction method. The changes include a new configuration file, the core implementation of the DyCoke algorithm, and necessary integrations into the existing model and token reduction framework.

Overall, the PR lays a good foundation for the DyCoke method. However, there are several points that need attention, including a potential bug in handling an odd number of frames, typos in configuration and code, and areas where clarity could be improved. Addressing these will enhance the robustness and maintainability of the new feature.

Summary of Findings

Potential Bug: Data Loss with Odd Number of Frames: In dycoke.py, the dycole_ttm function's first loop does not correctly handle cases where num_frames is odd and greater than 1, potentially leading to the last frame being lost. This is a high-severity issue.
Configuration Clarity and Typos: The dycoke.yml configuration file contains placeholder paths that need user attention. There's also a parameter dycoke_radio which appears to be a typo for dycoke_ratio or merging_ratio and is currently unused by the Python code.
Naming Inconsistencies: The core function dycole_ttm and its registration helper add_dycole_ttm_to_get_2dPool in dycoke.py seem to have a typo ('dycole' instead of 'dycoke'). Consistent naming aligning with 'DyCoke' is recommended.
Code Clarity: The dycole_ttm function in dycoke.py is complex. A docstring explaining its logic, inputs, and outputs would greatly improve its understandability. Additionally, a comment regarding the second processing loop could be more descriptive.
Unused Imports: The file llmc/compression/token_reduction/dycoke.py imports functools, List, Optional, Tuple, Union from typing, Cache, DynamicCache from transformers.cache_utils, and prefill_wrapper from .utils, but these imports are not used in the new code. (Severity: low, not commented inline due to review settings)
Redundant Tensor Conversion: In llmc/compression/token_reduction/dycoke.py (lines 40-42 and 76-78), torch.tensor(similarity) is used where similarity is already a tensor. This is redundant. torch.stack(similarities) directly should suffice. (Severity: low, not commented inline due to review settings)

Merge Readiness

This pull request introduces promising functionality with the DyCoke method. However, due to the identified high-severity issue regarding data loss with an odd number of frames, and several medium-severity issues related to configuration, naming, and clarity, I recommend that these changes be addressed before merging. Once these points are resolved, the PR will be in a much stronger state. As a reviewer, I am not authorized to approve pull requests, so please ensure further review and approval from authorized maintainers after addressing the feedback.

gemini-code-assist · 2025-06-06T06:39:48Z

llmc/compression/token_reduction/dycoke.py

+    for i in range(0, num_frames - 1, 2):
+        # Get tokens for adjacent frames
+        frame1_tokens = image_feature[
+            i * num_tokens_per_frame: (i + 1) * num_tokens_per_frame
+        ]
+        frame2_tokens = image_feature[
+            (i + 1) * num_tokens_per_frame: (i + 2) * num_tokens_per_frame
+        ]
+
+        # Calculate cosine similarity between normalized tokens
+        frame1_norm = torch.nn.functional.normalize(frame1_tokens, p=2, dim=1)
+        frame2_norm = torch.nn.functional.normalize(frame2_tokens, p=2, dim=1)
+        similarity = torch.nn.functional.cosine_similarity(
+            frame1_norm, frame2_norm, dim=1
+        )
+        similarities.append(similarity)
+
+    similarities = torch.stack(
+        [torch.tensor(similarity) for similarity in similarities]
+    )
+
+    # Process even frames
+    modified_image_feature = []
+    for i in range(0, num_frames - 1, 2):
+        frame1_tokens = image_feature[
+            i * num_tokens_per_frame: (i + 1) * num_tokens_per_frame
+        ]
+        frame2_tokens = image_feature[
+            (i + 1) * num_tokens_per_frame: (i + 2) * num_tokens_per_frame
+        ]
+
+        avg_similarity = similarities[i // 2]
+        num_tokens_to_keep = int(merging_ratio * num_tokens_per_frame)
+        tokens_to_keep = avg_similarity.topk(num_tokens_to_keep, largest=False).indices
+
+        modified_image_feature.append(frame1_tokens)
+        modified_image_feature.append(frame2_tokens[tokens_to_keep])


The first loop for i in range(0, num_frames - 1, 2) processes pairs of frames. If num_frames is odd and greater than 1 (e.g., 3, 5), the last frame will be omitted from modified_image_feature. For example, if num_frames = 3, the loop runs for i=0, processing frame0 and frame1. frame2 is never added to modified_image_feature, leading to data loss.

Could you review the logic for handling an odd number of frames? One common approach is to append the last frame unprocessed if num_frames is odd.

A docstring explaining the overall algorithm, inputs, and outputs of this function would also be very beneficial for future understanding and maintenance.

gemini-code-assist · 2025-06-06T06:39:48Z

configs/sparsification/methods/DyCoke/dycoke.yml

+    seed: &seed 42
+model:
+    type: Llava OneVision
+    path: model path


The model path is set to a placeholder model path. Is this intended to be a generic template? If so, it might be helpful to add a comment indicating that this needs to be replaced with an actual path. If a default or example path could be provided, that would be even better for usability.

gemini-code-assist · 2025-06-06T06:39:48Z

configs/sparsification/methods/DyCoke/dycoke.yml

+    type: vqa
+    name: [mme]
+    download: False
+    path: MME dataset path


Similar to the model path, the dataset path MME dataset path is a placeholder. Could you clarify if this should be updated by the user, or perhaps provide an example or default?

gemini-code-assist · 2025-06-06T06:39:48Z

configs/sparsification/methods/DyCoke/dycoke.yml

+        dycoke_layer_idx: 3
+        num_tokens_per_frame: 196
+        merging_ratio: 0.7
+        dycoke_radio: 0.7


There's a parameter dycoke_radio: 0.7. Is 'radio' a typo for 'ratio'?

Additionally, the Python code (llmc/compression/token_reduction/dycoke.py) uses pruning_paras['merging_ratio'] (defined on line 21) but does not seem to use dycoke_radio.

Could you clarify:

If dycoke_radio should be dycoke_ratio or perhaps merging_ratio?

If it's a distinct parameter, how is it intended to be used?

dycoke_ratio: 0.7 # Or perhaps remove if redundant with merging_ratio

gemini-code-assist · 2025-06-06T06:39:48Z

configs/sparsification/methods/DyCoke/dycoke.yml

+save:
+    save_trans: False
+    save_fake: False
+    save_path: /path/to/save/


The save_path is set to a generic placeholder /path/to/save/. It would be helpful to either note that this needs user configuration or provide a more concrete example/default.

gemini-code-assist · 2025-06-06T06:39:48Z

llmc/compression/token_reduction/dycoke.py

+from .utils import prefill_wrapper
+
+
+def dycole_ttm(image_feature, pruning_paras):


There seems to be a typo in the function name. Should dycole_ttm be dycoke_ttm to align with the module name 'DyCoke' and the class name? This typo appears in its definition and usages (lines 100, 126). Consistent naming would improve readability and maintainability.

Suggested change

def dycole_ttm(image_feature, pruning_paras):

def dycoke_ttm(image_feature, pruning_paras):

gemini-code-assist · 2025-06-06T06:39:48Z

llmc/compression/token_reduction/dycoke.py

+        modified_image_feature.append(frame1_tokens)
+        modified_image_feature.append(frame2_tokens[tokens_to_keep])
+
+    # Process odd frames


The comment # Process odd frames for the loop starting on line 63 might be misleading. This loop iterates with i taking values 0, 4, 8, ... and processes frame_i and frame_{i+2}. These are originally even-indexed frames.

Perhaps a more descriptive comment like # Further prune even-indexed frames based on similarity to preceding even frames or similar would better reflect the logic?

gemini-code-assist · 2025-06-06T06:39:48Z

llmc/compression/token_reduction/dycoke.py

+    return combined_tokens
+
+
+def add_dycole_ttm_to_get_2dPool(model, post_hook_fn, pruning_paras):


Typo in function name: add_dycole_ttm_to_get_2dPool. Should this be add_dycoke_ttm_to_get_2dPool for consistency with the intended 'DyCoke' naming?

Suggested change

def add_dycole_ttm_to_get_2dPool(model, post_hook_fn, pruning_paras):

def add_dycoke_ttm_to_get_2dPool(model, post_hook_fn, pruning_paras):

gemini-code-assist · 2025-06-06T06:39:48Z

llmc/compression/token_reduction/dycoke.py

+            add_dycole_ttm_to_get_2dPool(
+                self.model.model, dycole_ttm, self.model.model.pruning_paras


Typo in function name usage: dycole_ttm. Should this be dycoke_ttm?

Suggested change

add_dycole_ttm_to_get_2dPool(

self.model.model, dycole_ttm, self.model.model.pruning_paras

add_dycoke_ttm_to_get_2dPool(

self.model.model, dycoke_ttm, self.model.model.pruning_paras

gemini-code-assist bot reviewed Jun 6, 2025

View reviewed changes

gemini-code-assist bot suggested changes Jun 6, 2025

View reviewed changes

chengtao-lv force-pushed the vlm branch from c774e91 to 21dcfdc Compare June 6, 2025 07:25

dycoke

21dcfdc

helloyongyang approved these changes Jun 6, 2025

View reviewed changes

helloyongyang merged commit 62d47ef into main Jun 6, 2025
2 checks passed

helloyongyang deleted the vlm branch June 6, 2025 07:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

dycoke #388

dycoke #388

Uh oh!

chengtao-lv commented Jun 6, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jun 6, 2025

Uh oh!

gemini-code-assist bot Jun 6, 2025

Uh oh!

gemini-code-assist bot Jun 6, 2025

Uh oh!

gemini-code-assist bot Jun 6, 2025

Uh oh!

gemini-code-assist bot Jun 6, 2025

Uh oh!

gemini-code-assist bot Jun 6, 2025

Uh oh!

gemini-code-assist bot Jun 6, 2025

Uh oh!

gemini-code-assist bot Jun 6, 2025

Uh oh!

gemini-code-assist bot Jun 6, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		from .utils import prefill_wrapper


		def dycole_ttm(image_feature, pruning_paras):

	def dycole_ttm(image_feature, pruning_paras):
	def dycoke_ttm(image_feature, pruning_paras):

		return combined_tokens


		def add_dycole_ttm_to_get_2dPool(model, post_hook_fn, pruning_paras):

	def add_dycole_ttm_to_get_2dPool(model, post_hook_fn, pruning_paras):
	def add_dycoke_ttm_to_get_2dPool(model, post_hook_fn, pruning_paras):

		add_dycole_ttm_to_get_2dPool(
		self.model.model, dycole_ttm, self.model.model.pruning_paras

dycoke #388

dycoke #388

Uh oh!

Conversation

chengtao-lv commented Jun 6, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Changelog

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Summary of Findings

Merge Readiness

Uh oh!

gemini-code-assist bot Jun 6, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jun 6, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jun 6, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jun 6, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jun 6, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jun 6, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jun 6, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jun 6, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jun 6, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants