Skip to content

Conversation

@mergennachin
Copy link
Collaborator

This commit adds comprehensive support for image-text-to-text models to optimum-executorch, extending the existing recipe system to handle multimodal vision-language models.

Key changes:

  • Added new image-text-to-text task to task registry
  • Created ImageTextToTextExportableModule for multimodal model export
  • Extended integrations to support both vision encoder and text decoder export
  • Added comprehensive tests for multimodal functionality
  • CLI now supports --task image-text-to-text for multimodal models

This enables users to export models like Gemma-3, LLaVA, and other vision-language models using the familiar optimum-executorch workflow:

optimum-cli export executorch --model google/gemma-3-4b-it --task image-text-to-text --recipe xnnpack

This commit adds comprehensive support for image-text-to-text models to optimum-executorch, extending the existing recipe system to handle multimodal vision-language models.

Key changes:
- Added new image-text-to-text task to task registry
- Created ImageTextToTextExportableModule for multimodal model export
- Extended integrations to support both vision encoder and text decoder export
- Added comprehensive tests for multimodal functionality
- CLI now supports --task image-text-to-text for multimodal models

This enables users to export models like Gemma-3, LLaVA, and other vision-language models using the familiar optimum-executorch workflow:

optimum-cli export executorch --model google/gemma-3-4b-it --task image-text-to-text --recipe xnnpack
@mergennachin mergennachin force-pushed the add-multimodal-support branch from d99a0eb to 276c188 Compare July 31, 2025 21:10
@guangy10
Copy link
Collaborator

guangy10 commented Aug 1, 2025

Is it overlap with #111?

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants