hi @HaozheLiu-ST Nice work! I'm wondering how do you get the detailed latency for each modules in the diffusion model. Is there some off-the-shelf tool to profiling the runtime? 