Skip to content

feat: fix Qwen2.5-VL compatibility with latest transformers and enabl…#2

Open
shahedmomenzadeh wants to merge 1 commit intoyuanc3:mainfrom
shahedmomenzadeh:main
Open

feat: fix Qwen2.5-VL compatibility with latest transformers and enabl…#2
shahedmomenzadeh wants to merge 1 commit intoyuanc3:mainfrom
shahedmomenzadeh:main

Conversation

@shahedmomenzadeh
Copy link

…e Colab support

Major refactor to ensure compatibility with the latest transformers library versions and enable execution on Google Colab environments.

Changes:

  • qwen2_5_vl_date.py:

    • Removed dependency on VideoInput (removed in recent transformers versions) and replaced it with a generic type definition.
    • Refactored date_processing_qwen2_5_vl__call__ to handle the removal of the videos keyword argument in the updated ImageProcessor.
    • Implemented logic to flatten video frames into the images input and manually reconstruct video_grid_thw and pixel_values_videos from the processor's output.
    • Added safety checks for timestamp segment calculations to prevent division by zero errors.
  • utils/sampling.py:

    • Fixed TypeError in tass_sampling caused by topk being None. Added logic to dynamically calculate topk based on attention weights if not provided.
  • demo.py:

    • Fixed a bug where the config path string was passed to load_and_patch_model instead of the loaded dictionary.
    • Forced use_fast=False in AutoProcessor instantiation to prevent compatibility issues with the fast processor implementation.
  • configs/demo.yaml:

    • Changed attn_implementation from flash_attention_2 to sdpa to ensure compatibility with standard Colab GPUs (e.g., T4) without requiring additional compilation.

…e Colab support

Major refactor to ensure compatibility with the latest `transformers` library versions and enable execution on Google Colab environments.

Changes:
- **qwen2_5_vl_date.py**:
  - Removed dependency on `VideoInput` (removed in recent transformers versions) and replaced it with a generic type definition.
  - Refactored `date_processing_qwen2_5_vl__call__` to handle the removal of the `videos` keyword argument in the updated `ImageProcessor`.
  - Implemented logic to flatten video frames into the `images` input and manually reconstruct `video_grid_thw` and `pixel_values_videos` from the processor's output.
  - Added safety checks for timestamp segment calculations to prevent division by zero errors.

- **utils/sampling.py**:
  - Fixed `TypeError` in `tass_sampling` caused by `topk` being `None`. Added logic to dynamically calculate `topk` based on attention weights if not provided.

- **demo.py**:
  - Fixed a bug where the config path string was passed to `load_and_patch_model` instead of the loaded dictionary.
  - Forced `use_fast=False` in `AutoProcessor` instantiation to prevent compatibility issues with the fast processor implementation.

- **configs/demo.yaml**:
  - Changed `attn_implementation` from `flash_attention_2` to `sdpa` to ensure compatibility with standard Colab GPUs (e.g., T4) without requiring additional compilation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant