Support for VLM Conversion with Encoder-Decoder Architecture

### Description of the bug:


Currently, the litert_torch library supports decoder-only model conversion, even for VLMs like qwen_vl, paligemma or smolvlm2 those mentioned in litert_torch.generative.examples. However, the export feature in litert_torch.genenerative.export_hf.export file mentions something about 'task'. By mentioning the task as 'image_text_to_text' the VLM will be loaded as pytorch model. But again from the exportable_module in  litert_torch.genenerative.export_hf.core it uses the decoder only model conversion script. Using both generative examples and export_hf files, the litert_torch library still supports a decoder only model conversion, which is task like 'llm_chat' and 'llm_prompt_lab' for litert-lm engine runtime. The litert-lm engine expects 'TF_LITE_VISION_ENCODER' and 'TF_LITE_AUDIO_ENCODER_HW' for image and audio support along with the decoder, the code for which is still missing. Can you please look into this problem, and provide a solution for this?

### Actual vs expected behavior:


VLM Conversion, expects both the encoder as well as the decoder to be exported. But however for all the mentioned VLMs, the VLM model returns a decoder with valid signatures. Due to this, only 'llm_chat' and 'llm_prompt_lab' tasks are supported in the Google AI Edge Gallery App, whereas the tasks like 'llm_ask_image' and 'llm_ask_audio' expects a 'TF_LITE_VISION_ENCODER' and 'TF_LITE_AUDIO_ENCODER_HW' respectively inside the .litertlm bundle. The conversion for which is still not supported using the litert_torch library. 

### Any other information you'd like to share?


_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for VLM Conversion with Encoder-Decoder Architecture #946

Description of the bug:

Actual vs expected behavior:

Any other information you'd like to share?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support for VLM Conversion with Encoder-Decoder Architecture #946

Description

Description of the bug:

Actual vs expected behavior:

Any other information you'd like to share?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions