-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Is there an existing issue for this?
- I have searched the existing issues
Contact Details
No response
What should this feature add?
This request is to add native support for integrating and running the Qwen series of multimodal models within InvokeAI, specifically targeting the highly capable models from Alibaba Cloud.
I'm requesting support for:
- Qwen-VL (Vision-Language): Integration would enable features like advanced image-to-text generation, detailed captioning, visual question-answering, and general image analysis. This could be used to enhance prompt generation.
- Qwen Image Edit / Agent: Support for Qwen's instruction-driven image editing capabilities. This would allow users to perform complex, natural-language-guided image manipulations and stylistic modifications without relying solely on traditional in-painting masks.
Alternatives
While InvokeAI has fantastic in-painting, image-to-image, and ControlNet capabilities, direct instruction-based editing using LLM-guided vision models offers a more conversational, flexible approach to iterative image editing.
Additional Content
No response
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request