You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am writing again to you as I am facing a problem using retinaNet Model. It concern the losses.
When I try to train the model with small batches of data (8) all losses turns NaN. I had to increase the batch size to 32 and add global_clipnorm=10 before the model copute correct losses. But When I put some more samples, 32 samples per batche reproduce NaN and I had to increase batch size until 64. And so on if i add more samples.
I want to reduce the batch size in order to finetune the backbone. (batch of 64 make my GPU OOM)
I appreciate any help you could provide in resolving this issue.
Nice new presentation for the new release by the way ! New tutorials are great !
Best regards,
Hugo
Training with small batch size :
Epoch 1/100
483/483 [==============================] - 60s 89ms/step - loss: nan - box_loss: nan - classification_loss: nan - percent_boxes_matched_with_anchor: 0.0018 - val_loss: nan - val_box_loss: nan - val_classification_loss: nan - val_percent_boxes_matched_with_anchor: 0.0000e+00 - lr: 0.0050
Epoch 2/100
483/483 [==============================] - 26s 55ms/step - loss: nan - box_loss: nan - classification_loss: nan - percent_boxes_matched_with_anchor: 0.0018 - val_loss: nan - val_box_loss: nan - val_classification_loss: nan - val_percent_boxes_matched_with_anchor: 0.0000e+00 - lr: 0.0050
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Dear kerasCV team,
I am writing again to you as I am facing a problem using retinaNet Model. It concern the losses.
When I try to train the model with small batches of data (8) all losses turns NaN. I had to increase the batch size to 32 and add global_clipnorm=10 before the model copute correct losses. But When I put some more samples, 32 samples per batche reproduce NaN and I had to increase batch size until 64. And so on if i add more samples.
I want to reduce the batch size in order to finetune the backbone. (batch of 64 make my GPU OOM)
I appreciate any help you could provide in resolving this issue.
Nice new presentation for the new release by the way ! New tutorials are great !
Best regards,
Hugo
Training with small batch size :
Training with batch size = 64:
(The val_percent_boxes_matched_with_anchor value is remaining constant, is it normal ? )
My images are a stack of 3 radar band gama corrected to put them between 0 and 1. I give you some of my inputs here :
I create my dataset from a generator (I am not using directly the generator because I do not succed to make it works) :
The getitem of the generator :
Beta Was this translation helpful? Give feedback.
All reactions