Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 26 additions & 0 deletions configs/sparsification/methods/DyCoke/dycoke.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
base:
seed: &seed 42
model:
type: Llava OneVision
path: model path

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The model path is set to a placeholder model path. Is this intended to be a generic template? If so, it might be helpful to add a comment indicating that this needs to be replaced with an actual path. If a default or example path could be provided, that would be even better for usability.

torch_dtype: auto
eval:
eval_pos: [pretrain, transformed]
type: vqa
name: [mme]
download: False
path: MME dataset path

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the model path, the dataset path MME dataset path is a placeholder. Could you clarify if this should be updated by the user, or perhaps provide an example or default?

bs: 1
inference_per_block: False
sparse:
method: TokenReduction
special:
method: DyCoke
dycoke_layer_idx: 3
num_tokens_per_frame: 196
merging_ratio: 0.7
dycoke_radio: 0.7

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There's a parameter dycoke_radio: 0.7. Is 'radio' a typo for 'ratio'?

Additionally, the Python code (llmc/compression/token_reduction/dycoke.py) uses pruning_paras['merging_ratio'] (defined on line 21) but does not seem to use dycoke_radio.

Could you clarify:

  1. If dycoke_radio should be dycoke_ratio or perhaps merging_ratio?
  2. If it's a distinct parameter, how is it intended to be used?
        dycoke_ratio: 0.7 # Or perhaps remove if redundant with merging_ratio

save:
save_trans: False
save_fake: False
save_path: /path/to/save/
1 change: 1 addition & 0 deletions llmc/compression/token_reduction/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
from .base_blockwise_token_reduction import TokenReduction
from .dycoke import DyCoke
from .fastervlm import FasterVLM
from .fastv import FastV
from .pyramiddrop import PyramidDrop
Expand Down
132 changes: 132 additions & 0 deletions llmc/compression/token_reduction/dycoke.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
import functools
from typing import List, Optional, Tuple, Union

import torch
import torch.nn.functional as F
from loguru import logger

try:
from llava.model.llava_arch import LlavaMetaForCausalLM
except ModuleNotFoundError:
logger.info('LlavaMetaForCausalLM not found, if need, please install llava first.')
from transformers.cache_utils import Cache, DynamicCache

from llmc.utils.registry_factory import TOKEN_REDUCTION_REGISTRY

from .token_reduction_module import TokenReductionModule
from .utils import prefill_wrapper


def dycole_ttm(image_feature, pruning_paras):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There seems to be a typo in the function name. Should dycole_ttm be dycoke_ttm to align with the module name 'DyCoke' and the class name? This typo appears in its definition and usages (lines 100, 126). Consistent naming would improve readability and maintainability.

Suggested change
def dycole_ttm(image_feature, pruning_paras):
def dycoke_ttm(image_feature, pruning_paras):

bs, num_tokens_per_frame, _ = image_feature.shape
image_feature = image_feature.flatten(0, 1)
# Split frames into tokens
num_frames = image_feature.shape[0] // num_tokens_per_frame
merging_ratio = 1 - pruning_paras['merging_ratio']
# Calculate similarities between adjacent even frames
similarities = []
for i in range(0, num_frames - 1, 2):
# Get tokens for adjacent frames
frame1_tokens = image_feature[
i * num_tokens_per_frame: (i + 1) * num_tokens_per_frame
]
frame2_tokens = image_feature[
(i + 1) * num_tokens_per_frame: (i + 2) * num_tokens_per_frame
]

# Calculate cosine similarity between normalized tokens
frame1_norm = torch.nn.functional.normalize(frame1_tokens, p=2, dim=1)
frame2_norm = torch.nn.functional.normalize(frame2_tokens, p=2, dim=1)
similarity = torch.nn.functional.cosine_similarity(
frame1_norm, frame2_norm, dim=1
)
similarities.append(similarity)

similarities = torch.stack(
[torch.tensor(similarity) for similarity in similarities]
)

# Process even frames
modified_image_feature = []
for i in range(0, num_frames - 1, 2):
frame1_tokens = image_feature[
i * num_tokens_per_frame: (i + 1) * num_tokens_per_frame
]
frame2_tokens = image_feature[
(i + 1) * num_tokens_per_frame: (i + 2) * num_tokens_per_frame
]

avg_similarity = similarities[i // 2]
num_tokens_to_keep = int(merging_ratio * num_tokens_per_frame)
tokens_to_keep = avg_similarity.topk(num_tokens_to_keep, largest=False).indices

modified_image_feature.append(frame1_tokens)
modified_image_feature.append(frame2_tokens[tokens_to_keep])
Comment on lines +28 to +64

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The first loop for i in range(0, num_frames - 1, 2) processes pairs of frames. If num_frames is odd and greater than 1 (e.g., 3, 5), the last frame will be omitted from modified_image_feature. For example, if num_frames = 3, the loop runs for i=0, processing frame0 and frame1. frame2 is never added to modified_image_feature, leading to data loss.

Could you review the logic for handling an odd number of frames? One common approach is to append the last frame unprocessed if num_frames is odd.

A docstring explaining the overall algorithm, inputs, and outputs of this function would also be very beneficial for future understanding and maintenance.


# Process odd frames

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The comment # Process odd frames for the loop starting on line 63 might be misleading. This loop iterates with i taking values 0, 4, 8, ... and processes frame_i and frame_{i+2}. These are originally even-indexed frames.

Perhaps a more descriptive comment like # Further prune even-indexed frames based on similarity to preceding even frames or similar would better reflect the logic?

odd_similarities = []
for i in range(0, num_frames - 4, 4):
frame1_tokens = image_feature[
i * num_tokens_per_frame: (i + 1) * num_tokens_per_frame
]
frame2_tokens = image_feature[
(i + 2) * num_tokens_per_frame: (i + 3) * num_tokens_per_frame
]

similarity = torch.nn.functional.cosine_similarity(
frame1_tokens, frame2_tokens, dim=1
)
odd_similarities.append(similarity)

odd_similarities = torch.stack(
[torch.tensor(similarity) for similarity in odd_similarities]
)

for i in range(0, num_frames - 4, 4):
frame1_tokens = image_feature[
i * num_tokens_per_frame: (i + 1) * num_tokens_per_frame
]
frame2_tokens = image_feature[
(i + 2) * num_tokens_per_frame: (i + 3) * num_tokens_per_frame
]

avg_similarity = odd_similarities[i // 4]
num_tokens_to_keep = int(merging_ratio * num_tokens_per_frame)
tokens_to_keep = avg_similarity.topk(num_tokens_to_keep, largest=False).indices

modified_image_feature[i] = frame1_tokens
modified_image_feature[i + 2] = frame2_tokens[tokens_to_keep]

# Combine all tokens
combined_tokens = torch.cat(modified_image_feature, dim=0).unsqueeze(0)
return combined_tokens


def add_dycole_ttm_to_get_2dPool(model, post_hook_fn, pruning_paras):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Typo in function name: add_dycole_ttm_to_get_2dPool. Should this be add_dycoke_ttm_to_get_2dPool for consistency with the intended 'DyCoke' naming?

Suggested change
def add_dycole_ttm_to_get_2dPool(model, post_hook_fn, pruning_paras):
def add_dycoke_ttm_to_get_2dPool(model, post_hook_fn, pruning_paras):

original_fn = model.get_2dPool

def wrapped_fn(*args, **kwargs):
result = original_fn(*args, **kwargs)
return post_hook_fn(result, pruning_paras)

model.get_2dPool = wrapped_fn


@TOKEN_REDUCTION_REGISTRY.register('DyCoke')
class DyCoke(TokenReductionModule):
def __init__(self, config, model, blocks):
super().__init__(config, model, blocks)
self.add_sparse_config()
self.register_reduction_modules()

def add_sparse_config(self):
self.special_config['different_token_idxs'] = []
self.dycoke_layer_idx = self.special_config['dycoke_layer_idx']
self.model.model.pruning_paras = self.special_config

def register_reduction_modules(self):

if isinstance(self.model.model, LlavaMetaForCausalLM):
add_dycole_ttm_to_get_2dPool(
self.model.model, dycole_ttm, self.model.model.pruning_paras
Comment on lines +130 to +131

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Typo in function name usage: dycole_ttm. Should this be dycoke_ttm?

Suggested change
add_dycole_ttm_to_get_2dPool(
self.model.model, dycole_ttm, self.model.model.pruning_paras
add_dycoke_ttm_to_get_2dPool(
self.model.model, dycoke_ttm, self.model.model.pruning_paras

)
8 changes: 8 additions & 0 deletions llmc/models/llava_onevision.py
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,14 @@ def build_model(self):

self.processor = None

def find_blocks(self):
if self.get_modality() == 'language':
super().find_blocks()
elif self.get_modality() == 'vision':
self.blocks = self.vision_model.vision_tower.vision_model.encoder.layers
else:
raise Exception(f'Llava_OneVision do not support {self.get_modality()} modality.')


@MODEL_REGISTRY
class Llava_OneVision_Eval(LLaVA_OV):
Expand Down