[enhancement]: Support for Qwen Multimodal Models (Qwen-VL and Image Edit)

### Is there an existing issue for this?

- [x] I have searched the existing issues

### Contact Details

_No response_

### What should this feature add?

This request is to add native support for integrating and running the Qwen series of multimodal models within InvokeAI, specifically targeting the highly capable models from Alibaba Cloud.

I'm requesting support for:

1. **Qwen-VL (Vision-Language):** Integration would enable features like advanced image-to-text generation, detailed captioning, visual question-answering, and general image analysis. This could be used to enhance prompt generation.
2. **Qwen Image Edit / Agent:** Support for Qwen's instruction-driven image editing capabilities. This would allow users to perform complex, natural-language-guided image manipulations and stylistic modifications without relying solely on traditional in-painting masks.

### Alternatives

While InvokeAI has fantastic in-painting, image-to-image, and ControlNet capabilities, direct instruction-based editing using LLM-guided vision models offers a more conversational, flexible approach to iterative image editing.

### Additional Content

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[enhancement]: Support for Qwen Multimodal Models (Qwen-VL and Image Edit) #8983

Is there an existing issue for this?

Contact Details

What should this feature add?

Alternatives

Additional Content

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[enhancement]: Support for Qwen Multimodal Models (Qwen-VL and Image Edit) #8983

Description

Is there an existing issue for this?

Contact Details

What should this feature add?

Alternatives

Additional Content

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions