In the original paper, this is a L2 regularization across different scales. In your code, do you add this design to your aggregation code ?