Fix IDs shape mismatch in SFT for VLMs with text-only by albertvillanova · Pull Request #5354 · huggingface/trl

albertvillanova · 2026-03-23T15:05:57Z

Fix IDs shape mismatch in SFT for VLMs with text-only.

This PR addresses a regression issue when training vision-language models (VLMs) with text-only datasets, ensuring compatibility between data preprocessing and model expectations. The main focus is on fixing how input IDs are handled for VLMs and adding a regression test to prevent future breakage.

Changes

Bug fix for VLM text-only input handling:

Fixed an inconsistency in tokenize_fn where VLM processors returned input IDs as a list of lists (e.g., [[1, 2, 3]]) instead of a flat list (e.g., [1, 2, 3]). The function now unwraps the extra list level to prevent downstream shape errors in models expecting 3-D position IDs.

Testing improvements:

Added a regression test in test_sft_trainer.py to verify that training a VLM with a text-only dataset works correctly and does not produce shape errors.

Note

Medium Risk
Touches SFT dataset tokenization for standard (non-conversational) examples; a small shape-normalization change could affect any processor that returns nested input_ids, but it is guarded by a targeted regression test.

Overview
Fixes a regression when training vision-language models on text-only standard datasets by normalizing input_ids returned from VLM processing_class calls (unwrapping [[...]] to [...]) to match the LLM code path and avoid downstream shape/position-id errors.

Adds a regression case to test_train_vlm_text_only_data to include standard_language_modeling in the parameterized dataset configs, ensuring VLM text-only training remains supported.

^{Written by Cursor Bugbot for commit 7b1acb1. This will update automatically on new commits. Configure here.}

HuggingFaceDocBuilderDev · 2026-03-23T15:08:35Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

tests/test_sft_trainer.py

qgallouedec · 2026-03-23T22:18:24Z

For the records: we do not have the same issue in DPO (text-only data is properly supported), neither in RewardTrainer which doesn't support VLM

albertvillanova added 2 commits March 23, 2026 15:51

Add SFT test_train_vlm_text_only_dataset

e54757a

Fix SFT tokenize_fn for VLMs with text data

a90257d

Merge remote-tracking branch 'upstream/main' into fix-5334

415a4a4

qgallouedec reviewed Mar 23, 2026

View reviewed changes

tests/test_sft_trainer.py Outdated Show resolved Hide resolved

albertvillanova added 4 commits March 24, 2026 08:38

Replace test_train_vlm_text_only_dataset

938329b

Merge remote-tracking branch 'upstream/main' into fix-5334

a32ad64

Merge remote-tracking branch 'upstream/main' into fix-5334

902bea8

Merge remote-tracking branch 'upstream/main' into fix-5334

7b1acb1

qgallouedec approved these changes Mar 24, 2026

View reviewed changes

albertvillanova merged commit ee77df9 into huggingface:main Mar 24, 2026
11 of 12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix IDs shape mismatch in SFT for VLMs with text-only#5354

Fix IDs shape mismatch in SFT for VLMs with text-only#5354
albertvillanova merged 7 commits intohuggingface:mainfrom
albertvillanova:fix-5334

albertvillanova commented Mar 23, 2026 •

edited by cursor bot

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Mar 23, 2026

Uh oh!

Uh oh!

qgallouedec commented Mar 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

albertvillanova commented Mar 23, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Uh oh!

HuggingFaceDocBuilderDev commented Mar 23, 2026

Uh oh!

Uh oh!

qgallouedec commented Mar 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

albertvillanova commented Mar 23, 2026 •

edited by cursor bot

Loading