-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Wonderful work you guys did!
But I have a samll question:
In train.py, from 343 to 360, it can be seen that the precision will be selected according to different distributed training modes
In accelerate_config_4_gpu.yaml, it has specially set up DeepSpeed. However, in train_4_gpu.sh, it sets MULTI_GPU through command-line parameters.
So, I would like to know in which distributed environment the trained weights you provided were actually trained?
Because I found that when I used the repository code and made no modifications, the video quality generated by the two models that I trained on 4 A800 was slightly worse than the weights you provided.
Specifically, the model I trained performed somewhat poorly in terms of its ability to distinguish the magnitude of forces and the unity of objects after being subjected to forces.
Do you think this is caused by the randomness of training? Is it still caused by this distributed environment in the code? Or is it a problem in some other aspect?
Looking forward to your reply. Thanks!