TrainingDivergedException executing TransferFreshFruit Example + Differences in ATLearn Embeddings #3335
Unanswered
rd-peter-braun
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I got
Caused by: ai.djl.TrainingDivergedException: The Loss became NaN, try reduce learning rate,add clipGradient option to your optimizer, check input data and loss calculation. at ai.djl.training.listener.DivergenceCheckTrainingListener.onTrainingBatch(DivergenceCheckTrainingListener.java:27) ~[api-0.28.0.jar:na]
Exception during the SoftmaxCrossEntropy evaluation when executing the TransferFreshFruit Example.
It occurs during the second epoch.
I tried it with the djl embedding model djl://ai.djl.pytorch/resnet18_embedding and with a self generated one using ATLearn.
Latter had a differing final layer - .addSingleton(nd -> nd.squeeze(new int[] {2, 3})) did not work since only 2 dimensions existed.
I use a custom trainset containing 83 classes - each class contains 250 images (jpg size 640x480). Which means my dataset construction differs from the DJL Example...
properties + code example:
trained-model-name: "eb_resnet_18" // props.getTrainedModelName()...
trainset-path: "trainset" // contains 83 classes....
epochs: 25
engine: PyTorch
device: CPU
batch-size: 32
learning-rate: 0.001
patience: 4
train-param: true
Beta Was this translation helpful? Give feedback.
All reactions