Very confusion. I search a lot about BP algorithm, Some notes says it is ok only to differential w.r.t. W(parameter) and use residual to get gradient ? Your example seems we need also to calculate gradient w.r.t. input data(former layer output). I am confusion?
For example , how to derive gradient w.r.t. input image in convolutional layer?