Currently, it seems that intermediate files(.wav) are generated uniformly before the all transcription task starts, and cleared after all transcription tasks are completed.
This results in huge temporary disk usage.
In my example, the task of transcribing of 7.8GB MP4 audios (total 120 files) generates about 50GB intermediate files, which is unfriendly to the gpu server in the cloud environment.