As we know, the decision boundary in softmax loss is (W1 −W2)x+b1 −b2 =0, where Wi and bi are weights and bias in softmax loss, respectively. If we define x as a feature vector and constrain ∥W1∥=∥W2∥=1 and b1 =b2 =0. But I want to know what is the function of normalizing weights ?
Anyone can give some advises?