Distributed training experience

This is about distributed training with TensorFlow. This could use distributed TensorFlow (TFDistributed.py in RETURNN, issue #296) or Horovod (RETURNN doc about Horovod) (or a mixture of both). This could use the new TF dataset pipeline (TFDataPipeline.py in RETURNN, issue #292) or the old data pipeline. This might also need to extend some of the existing implementations.

We care about several settings: