Correct numerical regression in vision embeddings #41374

i3hz · 2025-10-06T14:21:06Z

What does this PR do?

This PR fixes a numerical regression bug in the vision positional embedding calculation that was introduced between transformers versions v4.54.1 and v4.55.1. The original change was made to improve exportability but resulted in a slight floating point difference.
The fix was applied to the base modular files (modular_idefics2.py and modular_idefics3.py) and then propagated to all dependent models via make fix-copies.

Fixes #41190

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case. Regression in SmolVLM results in different vision embeddings #41190
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

Who can review

@zucchini-nlp

github-actions · 2025-10-06T14:22:18Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: idefics2, idefics3, smolvlm

zucchini-nlp · 2025-10-07T09:15:58Z

run-slow: idefics2, idefics3, smolvlm

github-actions · 2025-10-07T09:17:31Z

This comment contains run-slow, running the specified jobs:

models: ['models/idefics2', 'models/idefics3', 'models/smolvlm']
quantizations: [] ...

HuggingFaceDocBuilderDev · 2025-10-07T09:25:09Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

i3hz · 2025-10-07T10:07:25Z

-slow: idefics2, idefics3

I'm struggling a bit to understand where exactly the tests have failed . I can see a test_model_parallel_beam_search test has failed .
Is there anything else that has failed ? And any pointers on how I can work on fix that would be really appreciated.
Thank you @zucchini-nlp for looking into the PR

i3hz · 2025-10-07T10:13:49Z

On my local machine after running python3 -m pytest -v -rsfE --make-reports=multi-gpu_run_models_gpu_models/smolvlm_test_reports tests/models/smolvlm

I got the result :
244 passed, 126 skipped, 18 warnings in 111.25s (0:01:51)

I don't have a multi gpu system so I'm not sure if that's why I have more skips

zucchini-nlp · 2025-10-07T10:54:22Z

@i3hz the only failing test is tests/models/smolvlm/test_modeling_smolvlm.py::SmolVLMForConditionalGenerationIntegrationTest::test_integration_test_video which seems to be unrelated

Btw for running locally you also need to set RUN_SLOW=1 which runs slow tests as well. Can you check that the export test passes, then we can merge

tests/models/smolvlm/test_modeling_smolvlm.py::SmolVLMForConditionalGenerationIntegrationTest::test_export_smolvlm_connector
tests/models/smolvlm/test_modeling_smolvlm.py::SmolVLMForConditionalGenerationIntegrationTest::test_export_smolvlm_text_decoder
tests/models/smolvlm/test_modeling_smolvlm.py::SmolVLMForConditionalGenerationIntegrationTest::test_export_smolvlm_vision_encoder

i3hz · 2025-10-07T11:13:56Z

@zucchini-nlp I ran the RUN_SLOW=1 pytest tests/models/smolvlm/test_modeling_smolvlm.py command .

With these results
8 failed, 176 passed, 93 skipped, 11 warnings in 182.77s (0:03:02)

The good news is that all three export tests passed successfully-

test_export_smolvlm_connector
test_export_smolvlm_text_decoder
test_export_smolvlm_vision_encoder

As for the failures -

test_integration_test_video
The other 7 are regarding FlashAttention . They all failed with the same error: RuntimeError: cu_seqlens_q must have shape (batch_size + 1). I could be wrong but I think these are unrelated .

Let me know what you think .
Thanks

zucchini-nlp

Thanks a lot for verifying! Looks good to me, let's merge

i3hz added 3 commits October 6, 2025 14:12

created modeling file

86eef88

XMerge remote-tracking branch 'upstream/main'

84c0576

Merge remote-tracking branch 'upstream/main'

0dac06b

i3hz mentioned this pull request Oct 6, 2025

Regression in SmolVLM results in different vision embeddings #41190

Closed

4 tasks

zucchini-nlp approved these changes Oct 7, 2025

View reviewed changes

zucchini-nlp merged commit 4763b8c into huggingface:main Oct 7, 2025
19 of 20 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Correct numerical regression in vision embeddings #41374

Correct numerical regression in vision embeddings #41374

i3hz commented Oct 6, 2025

Uh oh!

github-actions bot commented Oct 6, 2025

Uh oh!

zucchini-nlp commented Oct 7, 2025

Uh oh!

github-actions bot commented Oct 7, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Oct 7, 2025

Uh oh!

i3hz commented Oct 7, 2025

Uh oh!

i3hz commented Oct 7, 2025

Uh oh!

zucchini-nlp commented Oct 7, 2025

Uh oh!

i3hz commented Oct 7, 2025

Uh oh!

zucchini-nlp left a comment

Uh oh!

Uh oh!

Uh oh!

Correct numerical regression in vision embeddings #41374

Correct numerical regression in vision embeddings #41374

Conversation

i3hz commented Oct 6, 2025

What does this PR do?

Before submitting

Who can review

Uh oh!

github-actions bot commented Oct 6, 2025

Uh oh!

zucchini-nlp commented Oct 7, 2025

Uh oh!

github-actions bot commented Oct 7, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Oct 7, 2025

Uh oh!

i3hz commented Oct 7, 2025

Uh oh!

i3hz commented Oct 7, 2025

Uh oh!

zucchini-nlp commented Oct 7, 2025

Uh oh!

i3hz commented Oct 7, 2025

Uh oh!

zucchini-nlp left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!