I am currently using the multilanguage model for large scale audio generation. I observed a linear time increase during continuous calls to the model.generate(...)
I think the error is related to cache accumulation in T3 and the alignmet_stream, but not be able to fix it.
Do anyone knows possible reasons for this and how to fix it?