I just dont understand why the C4 backbone put the res5 block in the head. As i know, the Faster R-CNN paper use VGG.