Problems running the neural network MNIST example #21521

liona24 · 2024-05-30T16:19:31Z

liona24
May 30, 2024

Hey,

I just started messing around with the library and followed the tutorial here

(I am running on plain CPU, just FYI)

The vanilla version did not quite work out, the gradient seemed to be unstable and close to zero. Essentially I did not get any "training" effect, the parameters stayed constant for all epochs.

At this point I started questioning the model, specifically the final part seemed odd:

def predict(params, image):
  # per-example predictions
  activations = image
  for w, b in params[:-1]:
    outputs = jnp.dot(w, activations) + b
    activations = relu(outputs)
  
  final_w, final_b = params[-1]
  logits = jnp.dot(final_w, activations) + final_b
  return logits - logsumexp(logits)                   # <== what is this supposed to accomplish?

Because of this, I started changing the model slightly. Specifically I tried mirroring some of PyTorch's starter examples because they are already pretty close to the jax example.
Specifically this made me change this to a somewhat similar cross-entropy loss style learning.
Eventually this seemed to help to some degree. The gradient was a little bit more stable now and I could actually see training "progress" (something like 10% accuracy or so).

Since this was not even remotely close to the expected 90% accuracy, I simplified to a simple
softmax(logits) prediction alongside a simple quadratic loss function:

def loss(params, images, targets):
  preds = batched_predict(params, images)
  return ((preds - targets) ** 2).sum()      # <== changed compared to tutorial

Finally this resulted in the expected training progress, hitting 90% accuracy shortly after training start.

So what is going on here? Which gotcha is to avoid here, are the log evaluations numerically unstable? Is there something obvious that I am missing? Thanks for any help

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Problems running the neural network MNIST example #21521

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Problems running the neural network MNIST example #21521

Uh oh!

liona24 May 30, 2024

Replies: 0 comments

liona24
May 30, 2024