Description:
During training, I noticed that the learning rate is not consistent with the initial learning rate I set. After inspecting the code, I found the following adjustment being made to the initial learning rate:
self.lr = args.lr * args.batch_size / 256
Question:
What is the purpose of this adjustment by args.batch_size / 256? Is this done to accommodate a specific training strategy? I would appreciate an explanation of the reasoning behind this adjustment.
Thank you!