Environment:
- Python version [3.7.7]
- Spark version [3.0.1]
- TensorFlow version [2.3.0]
- TensorFlowOnSpark version [2.2.1]
- Cluster version [Standalone]
Question:
Is there a way to monitor the network utilization of nodes while communicating with each other to transfer the gradients in order to update the model? I want to measure the size of data sent from one node to another one for a single batch and all batches. I think that Tensorboard does not support such a feature
Spark Submit Command Line:
spark-submit --master spark://master:7077 train_file.py --cluster_size 3 --epochs 10