Hi, Thanks for your work making a Pytorch version of the paper - much appreciated! How does this implementation compare to results in the original paper. Specifically on the Moments in Time dataset. Thanks, Ed