I believe there is a reduction missing (probably Nx.mean/2) in the implementation of cross-entropy inside Scholar.Linear.LogisticRegression.
|
-Nx.sum(ys * log_softmax(Nx.dot(xs, coeff) + bias), axes: [-1]) |
xs is a tensor of shape
{num_samples, num_features},
ys is a tensor of shape
{num_samples, num_classes} (one-hot encoded). Hence, the expression above evaluates to a tensor of shape
{num_samples}. I don't know how exactly does
Polaris handle vectors, but I guess the loss should be a scalar.