Training on Custom dataset fails #641
-
Describe the bug
Expected behavior
The config file is changed by following instructions on github. Hardware and Software Configuration
|
Beta Was this translation helpful? Give feedback.
Replies: 7 comments
-
Can you run again with |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
Batch size is not the issue here then. If you look at this line https://github.com/openvinotoolkit/anomalib/blob/c1f51a6ccdb7cb26cd201a846f8049ac11b4e5cc/anomalib/models/padim/lightning_model.py#L78 it stores the embeddings for the entire training data with |
Beta Was this translation helpful? Give feedback.
-
@pmudgal-Intel As you know, unlike traditional deep learning model training, the training with Anomalib PaDiM model happens in just one epoch (it's actually fitting a latent space than training for PaDiM). This will try to fit your whole data set into the memory. So, if your data set size increases by the total RAM you have, the training crashes.
Hope this helps. |
Beta Was this translation helpful? Give feedback.
-
Thanks for the info, it is good to know that. Do the other models available in Anomalib work differently? That is, can they carry out an arbitrary number of epochs? (PatchCore, CFlow, GANomaly, ...) |
Beta Was this translation helpful? Give feedback.
-
Hi, similiar in my case, GANomaly, Reverse Distillation, Fastflow and STFPM works best for me with that large number of images in a dataset, and can run with the default number of epochs given in the config file. I also modify the default resizing option, from 256 to 128 and as low as 64, which may hurt the performance in the end. |
Beta Was this translation helpful? Give feedback.
-
Training on custom dataset works after trimming the dataset. |
Beta Was this translation helpful? Give feedback.
Training on custom dataset works after trimming the dataset.