Fixed the lpnormpool implementation#639
Conversation
Implemented the correct LpNorm Pooling and backprop steps
|
Original from @skyleaworlder, care to review? PR needs tests. At present only |
Forgive me if I'm mistaken, I am not familiar with tests. I found this line in I belive my implementation is faithful to the definition of Lp norm. If you can give me resources or tips to construct tests, I'll add that to my fork. |
| grad = x[input_kw, input_kh, input_kd, c, batch_idx]^(p-1) * y_idx^(1-p) | ||
| # y = (∑ᵢ |xᵢ|^p)^(1 / p), ∂y/∂xᵢ = |xᵢ|^(p-1) × y^(1-p) × sign(xᵢ) | ||
| xv = x[input_kw, input_kh, input_kd, c, batch_idx] | ||
| grad = abs(xv)^(p-1) * y_idx^(1-p) * sign(xv) |
There was a problem hiding this comment.
That comes from the derivative of abs
The lpnormpool implementation was wrong. The existing implementation is
y = (∑ᵢ xᵢ^p)^(1 / p), whereas the true Lp norm shuld bey = (∑ᵢ |xᵢ|^p)^(1 / p), where|x|isabs(x). This also affects the gradient, which should be∂y/∂xᵢ = |xᵢ|^(p-1) × y^(1-p) × sign(xᵢ)instead of∂y/∂xᵢ = xᵢ^(p-1) × y^(1-p). The necessary changes should go intosrc/impl/pooling_direct.jlwhich is what this PR achieves.The existing implementation is exact only for the input array
xsuch thatxᵢ >= 0 ∀ xᵢ ∈ x. This will not generally be the case. For example, when the input to the pooling layer comes from a layer which had leaky ReLU activation, then the problem of the implementation would manifest.MWE of the problem: