Inference getting slower in large scale generation of multiple files

I am currently using the multilanguage model for large scale audio generation.  I observed a linear time increase  during continuous calls to the model.generate(...)

I think the error is related to cache accumulation in T3 and the alignmet_stream, but not be able to fix it.

Do anyone knows possible reasons for this and how to fix it?