Thanks for the wonderful paper - it was a pleasure to read!
Could you kindly elaborate a bit more on the COCO training details? In particular, I was wondering about the following three points:
- What augmentations were used? Just large or small scale jittering and maybe flipping (left-right)?
- What learning rate schedule was used?
- Was the same learning rate for used for the different layers?