I think there is a problem in the formulation to compute the derivative of W and b in this tutorial. Isn't the W of layer l comes from error in layer l and activation in layer l-1? But the formulation suggests W in layer l comes from error in layer l+1 and activation in layer l.

I think the right one should look like this

The same goes to b. Or maybe I just misunderstood this, if so, please point out, thanks!