Similarities and distinctions from fellow work "ConsistencyTTA" #2
Replies: 1 comment
-
Thank you for your attention and effort to this discussion. We acknowledge that ConsistencyTTA earlier applied consistency models to text-to-sound tasks to accelerate generation. However, I would like to explain the similarities mentioned one by one.
In summary, the similarities between ConsistencyTTA and AudioLCM are actually more common methods/strategies. In addition, we have many obvious differences from ConsistencyTTA:
Finally, I do not think that AudioLCM is a supplement to ConsistencyTTA, because we did not carry out our work based on the work of ConsistencyTTA, which can be seen from the public code and papers. In fact, we are more based on the improvement and acceleration of Make-An-Audio 2, but we agree to add comparative experiments for ConsistencyTTA in our public papers, because this will help to achieve a more comprehensive and fair comparison of text-to-sound tasks. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Thank you for the awesome work! Accelerating text-to-audio generation is an important goal, and AudioLCM's contributions to this area are significantly appreciated.
We would like to bring to your attention our paper from September 2023, titled ConsistencyTTA: Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation, that explored a similar idea. ConsistencyTTA's code and model checkpoints are available here and here.
After a discussion with @liuhuadai, we agree that while ConsistencyTTA and AudioLCM see numerous similarities, they also have distinct differences.
The main similarities include:
The main differences include:
We therefore believe that AudioLCM is a valuable complement to ConsistencyTTA, providing important insights and understandings in consistency-models-powered text-to-audio generation. Shout out to @liuhuadai for the constructive discussion. The AudioLCM paper will be revised shortly to include this comparison.
Beta Was this translation helpful? Give feedback.
All reactions