Replies: 1 comment 1 reply
-
'Generating the prompt from an image' is not 'img2img'. 'Img2img' is to encode the image into latents and send them to the model. To generate the prompt from an audio, the authors use Qwen-Omni, see https://github.com/ace-step/ACE-Step/blob/main/TRAIN_INSTRUCTION.md . You can just send your audio to the online demo of Qwen-Omni and ask it to describe this audio. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi, I remember stable diffusion had a way to generate a prompt from an existing image, img2img, you can search in youtube and find various examples, like this one: https://www.youtube.com/watch?v=PUwLT9JwCs8
Basically the user uploads the image, the model analyzes it and reverses from the image the prompt that it thinks should generate the closest result, then the user can modify the prompt a bit and regenerate a similar image.
Would be cool for ACE-Step to also have this kind of feature. Like this we could upload existing music having no idea how to describe it to make it happen (I mean the prompt), then modify either lyrics or prompt and generate a similar one.
Beta Was this translation helpful? Give feedback.
All reactions