Hi,
I have been trying to reproduce your evaluation scores but so far the scores I am getting is quite worse than whats mentioned in the paper.
For the standard UniAnimate I get L1 : 6.36E-04, PSNR* : 13.29 , SSIM: 0.549, LPIPS: 0.425
For the UniAnimate-Long I get L1 : 4.07E-04 , PSNR* : 15.99 , SSIM: 0.631 , LPIPS: 0.309
None of these match the reported score of L1: 2.66E-04 , PSNR: 20.58 , SSIM: 0.811 , LPIPS: 0.231
I am using the benchmark pipeline from Disco ( as mentioned in the paper) and using the same 10 tiktok videos with configuration of 256x256 resolution that DiSCO uses.
Could you kindly share more details of your evaluation setup (e.g. A) standard or long B) Max Frames and any other information that would be useful in recreating the pipeline)?
(Corrected PSNR* as per: [Source])