Image Text To Text Support #296

IlyasMoutawwakil · 2024-11-19T15:50:02Z

this is a bit more complicated than other tasks because the inputs are not the same across model types (and backends), for example the inputs are the usual input_ids and pixel_values with model types such as blip, blip2, git, ... but are more complex for model types such as idefics, idefics2, qwen2_vl, ... because they handle multiple images in the same prompt with interleaved text and image tokens.

idefics
idefics2
qwen2_vl
generic image-text-to-text (blip, blip2, git)

optimum_benchmark/generators/task_generator.py

optimum_benchmark/backends/transformers_utils.py

examples/pytorch_vlm.yaml

IlyasMoutawwakil · 2024-11-22T09:19:12Z

tests run on cpu and are passing for multiple architectures, if any custom input is needed, don't hesitate to open an issue/pr.
failing tests are unrelated.

initial support

f2b288a

IlyasMoutawwakil force-pushed the image-text-to-text branch from 0e46b92 to f2b288a Compare November 19, 2024 15:51

IlyasMoutawwakil commented Nov 19, 2024

View reviewed changes

optimum_benchmark/generators/task_generator.py Outdated Show resolved Hide resolved

optimum_benchmark/backends/transformers_utils.py Outdated Show resolved Hide resolved

examples/pytorch_vlm.yaml Show resolved Hide resolved

IlyasMoutawwakil mentioned this pull request Nov 19, 2024

Vision language model support #295

Closed

IlyasMoutawwakil changed the title ~~Text Image To Image Support~~ Image Text To Text Support Nov 19, 2024

IlyasMoutawwakil added 5 commits November 20, 2024 07:38

clean up

f2a7a2c

simpler

9a854ae

support idefics and idefics2

e5bf852

remove file

44caa15

support generic image-text-to-text as well (blip, blip2, ..)

2248f8e

IlyasMoutawwakil added pytorch [CI] Requires and enables running all PyTorch tests cli_cpu_pytorch cpu [CI] Requires and enables running all CPU tests and removed pytorch [CI] Requires and enables running all PyTorch tests cli_cpu_pytorch labels Nov 20, 2024

num_choices in tests

62746cc

IlyasMoutawwakil force-pushed the image-text-to-text branch from 3578dc9 to 62746cc Compare November 21, 2024 13:23

IlyasMoutawwakil added cuda [CI] Requires and enables running all CUDA tests misc [CI] Requires and enables running all basic utility tests across multiple platforms labels Nov 21, 2024

IlyasMoutawwakil merged commit 31aa662 into main Nov 22, 2024
57 of 84 checks passed

IlyasMoutawwakil deleted the image-text-to-text branch August 19, 2025 22:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Image Text To Text Support #296

Image Text To Text Support #296

Uh oh!

IlyasMoutawwakil commented Nov 19, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

IlyasMoutawwakil commented Nov 22, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Image Text To Text Support #296

Image Text To Text Support #296

Uh oh!

Conversation

IlyasMoutawwakil commented Nov 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

IlyasMoutawwakil commented Nov 22, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

IlyasMoutawwakil commented Nov 19, 2024 •

edited

Loading