Add multimodal support to optimum-executorch #116

mergennachin · 2025-07-31T21:08:47Z

This commit adds comprehensive support for image-text-to-text models to optimum-executorch, extending the existing recipe system to handle multimodal vision-language models.

Key changes:

Added new image-text-to-text task to task registry
Created ImageTextToTextExportableModule for multimodal model export
Extended integrations to support both vision encoder and text decoder export
Added comprehensive tests for multimodal functionality
CLI now supports --task image-text-to-text for multimodal models

This enables users to export models like Gemma-3, LLaVA, and other vision-language models using the familiar optimum-executorch workflow:

optimum-cli export executorch --model google/gemma-3-4b-it --task image-text-to-text --recipe xnnpack

This commit adds comprehensive support for image-text-to-text models to optimum-executorch, extending the existing recipe system to handle multimodal vision-language models. Key changes: - Added new image-text-to-text task to task registry - Created ImageTextToTextExportableModule for multimodal model export - Extended integrations to support both vision encoder and text decoder export - Added comprehensive tests for multimodal functionality - CLI now supports --task image-text-to-text for multimodal models This enables users to export models like Gemma-3, LLaVA, and other vision-language models using the familiar optimum-executorch workflow: optimum-cli export executorch --model google/gemma-3-4b-it --task image-text-to-text --recipe xnnpack

guangy10 · 2025-08-01T20:09:54Z

Is it overlap with #111?

HuggingFaceDocBuilderDev · 2025-08-01T20:10:57Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

mergennachin force-pushed the add-multimodal-support branch from d99a0eb to 276c188 Compare July 31, 2025 21:10

mergennachin closed this Aug 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add multimodal support to optimum-executorch #116

Add multimodal support to optimum-executorch #116

Uh oh!

mergennachin commented Jul 31, 2025

Uh oh!

guangy10 commented Aug 1, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Aug 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add multimodal support to optimum-executorch #116

Add multimodal support to optimum-executorch #116

Uh oh!

Conversation

mergennachin commented Jul 31, 2025

Uh oh!

guangy10 commented Aug 1, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Aug 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants