Is there any way to improve the quality of generated music? #134

ynicle · 2025-05-12T06:26:25Z

ynicle
May 12, 2025

Curruntly, the music quality is not so good.

May 16, 2025

The sampler/schedule combination used with this model makes a huge difference in output quality. Using Euler_Ancestral / Simple at around 50+ steps dramatically improves overall cohesion and sound quality compared to all of the other combinations I have tried. If you generate a 30 second sample with that combination and then try the sampler settings you were using before, I think you'll be able to hear the difference almost immediately.

View full answer

G7b9 · 2025-05-12T07:51:34Z

G7b9
May 12, 2025

I don’t know if a larger model or longer inference time can improve the quality of music. I guess the current model is just used to verify the feasibility of generation and editing, and a higher quality model may come later.

0 replies

ChuxiJ · 2025-05-12T08:57:27Z

ChuxiJ
May 12, 2025
Maintainer

You can see more discussions and issues here. The community has given us many useful prompts and parameter - tuning techniques.

0 replies

MDMAchine · 2025-05-13T04:20:30Z

MDMAchine
May 13, 2025

Here is a reply on a closed issue here, figure maybe people may find it useful. It utilizes ComfyUI:

Can you maybe share the workflow? Would be super helpful. Thanks @MDMAchine

Okay, here is a workflow for ComfyUI utilizing 3 separate methods for generation. https://drive.google.com/drive/folders/1KjVe_zJhAv5EERjr7aGYpVn14U6O6vbx?usp=sharing

Grab any of the FLAC files and drag them into ComfyUI or utilize the included JSON.

Version A main source sampler is from ComfyUI ACE-Step, which uses the Hugging Face files and is more akin to the Gradio GUI version (Euler and APG). These should download to /ComfyUI/models/TTS/Ace-Step.vXXX folder. It will take a while; however, if you already downloaded them from the Gradio app, you can always copy them over there (in repo format) and save yourself a second download.

Version B main source sampler is using Sampler Custom Advanced, DEIS sampler, linear_quadratic scheduler, and Sonar custom noise (Student-t). The models used are in GGUF format, and the nodes that can load them (as of my last checking) are HERE.

Version C is a chain of 3 KSamplers Advanced, 20 steps each in a chain: DEIS > Uni-PC > Gradient Estimation samplers. All use the kl_optimal scheduler and the same GGUF models as in Version B.

All 3 have the option to RePaint > Re-Tone (resample using Sampler Custom Advanced). There are toggles to turn them on/off. RePaint is from the wrapper nodes and uses the HF files; ReTone is Sampler Custom Advanced using GGUF.

In the preview section, there are primitive nodes that can adjust RePaint variance and ReTone denoise levels.
RAW = 1st iteration
Sampler = RePainted and/or ReTone product
Sig Proc = Final output

Then a signal processing chain to clean up and a basic "master."

Also, if you plan to save it and have a chance of ever re-creating the project, run and stop it several times — then it should stay the same (unless you change the seed).

Hope it helps! Good luck!

Also bunch of samples made from variations of the workflow as it develops.
https://drive.google.com/drive/folders/1NiTuIeUxbUbYQjn36nnySTccZzdM6uud?usp=sharing

3 replies

nikolatesla20 May 29, 2025

Holy hell that workflow is overly complicated with a billion custom nodes. And it doesn't run correctly anyway.

MDMAchine May 29, 2025

Yeah, never intended to be a simple workflow, just an example. Being new, lots of changes are happening.
The Discord is a better place to find out about the latest strategies.

Once I'm happy with some of the nodes I'm working on, I'll throw together a simplified workflow.

a3nima May 29, 2025

I've tried the gradio version they provide and it worked well btw. just for people that dont want to run it in comfy, might be an alternative.

timbo1975 · 2025-05-13T22:30:24Z

timbo1975
May 13, 2025

Curruntly, the music quality is not so good.

I initially thought the same but l then started to play around with settings (there's no right or wrong just experiment) I have found success in 27 steps with 45 second songs this way you get some output quick and then can play with other settings until you get what you want. Its also worth using something like QWEN or other LLM to help with describing the instruments finally try it without vocals and just have the instruments then add the vocals after.

1 reply

a3nima May 17, 2025

exactly how and where do you describe the instrumentals, like you're suggesting?

jehusephat · 2025-05-16T16:47:33Z

jehusephat
May 16, 2025

The sampler/schedule combination used with this model makes a huge difference in output quality. Using Euler_Ancestral / Simple at around 50+ steps dramatically improves overall cohesion and sound quality compared to all of the other combinations I have tried. If you generate a 30 second sample with that combination and then try the sampler settings you were using before, I think you'll be able to hear the difference almost immediately.

1 reply

allo- May 20, 2025

I think the new pingpong scheduler produces good results. But it seems that all results have limited audio quality not in the sense of bad generation but like they were recorded with too low bitrate.
I think they promised the new model will replace some parts with parts that produce better quality audio.

allo- · 2025-05-29T10:14:35Z

allo-
May 29, 2025

I wonder if one could use some music restoration tool. If an image generation is not that good, an upscaler can often conceal some of the artifacts. Maybe some GAN thought to restore old recordings might be able to improve on the audio quality of a finished track.

0 replies

Is there any way to improve the quality of generated music? #134

Uh oh!

Replies: 6 comments · 5 replies

Uh oh!

Uh oh!

ChuxiJ May 12, 2025 Maintainer

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Replies: 6 comments 5 replies

ChuxiJ
May 12, 2025
Maintainer