what's the main difference between single gpu and data parallel?

Hi, I just wonder the difference of train script between "single_gpu" and "data_parallel", since they seem like have the same structure and module, also using the same API.

By the way, would you introduce how to use the distributed one? I am a little bit confuse about how to set the url and how to start using this. 

Thx.