feat: fix Qwen2.5-VL compatibility with latest transformers and enabl…#2
Open
shahedmomenzadeh wants to merge 1 commit intoyuanc3:mainfrom
Open
feat: fix Qwen2.5-VL compatibility with latest transformers and enabl…#2shahedmomenzadeh wants to merge 1 commit intoyuanc3:mainfrom
shahedmomenzadeh wants to merge 1 commit intoyuanc3:mainfrom
Conversation
…e Colab support Major refactor to ensure compatibility with the latest `transformers` library versions and enable execution on Google Colab environments. Changes: - **qwen2_5_vl_date.py**: - Removed dependency on `VideoInput` (removed in recent transformers versions) and replaced it with a generic type definition. - Refactored `date_processing_qwen2_5_vl__call__` to handle the removal of the `videos` keyword argument in the updated `ImageProcessor`. - Implemented logic to flatten video frames into the `images` input and manually reconstruct `video_grid_thw` and `pixel_values_videos` from the processor's output. - Added safety checks for timestamp segment calculations to prevent division by zero errors. - **utils/sampling.py**: - Fixed `TypeError` in `tass_sampling` caused by `topk` being `None`. Added logic to dynamically calculate `topk` based on attention weights if not provided. - **demo.py**: - Fixed a bug where the config path string was passed to `load_and_patch_model` instead of the loaded dictionary. - Forced `use_fast=False` in `AutoProcessor` instantiation to prevent compatibility issues with the fast processor implementation. - **configs/demo.yaml**: - Changed `attn_implementation` from `flash_attention_2` to `sdpa` to ensure compatibility with standard Colab GPUs (e.g., T4) without requiring additional compilation.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
…e Colab support
Major refactor to ensure compatibility with the latest
transformerslibrary versions and enable execution on Google Colab environments.Changes:
qwen2_5_vl_date.py:
VideoInput(removed in recent transformers versions) and replaced it with a generic type definition.date_processing_qwen2_5_vl__call__to handle the removal of thevideoskeyword argument in the updatedImageProcessor.imagesinput and manually reconstructvideo_grid_thwandpixel_values_videosfrom the processor's output.utils/sampling.py:
TypeErrorintass_samplingcaused bytopkbeingNone. Added logic to dynamically calculatetopkbased on attention weights if not provided.demo.py:
load_and_patch_modelinstead of the loaded dictionary.use_fast=FalseinAutoProcessorinstantiation to prevent compatibility issues with the fast processor implementation.configs/demo.yaml:
attn_implementationfromflash_attention_2tosdpato ensure compatibility with standard Colab GPUs (e.g., T4) without requiring additional compilation.