Replies: 2 comments
-
@yahav6893, have you tried other methods? My guess is that memory-based algorithms such as PADIM and Patchcore might fail if you have a large dataset. This is mainly because these models try to fit the entire feature map into a memory bank, which becomes quite huge for large datasets that CPU cannot handle. |
Beta Was this translation helpful? Give feedback.
-
Since this is probably algorithm-specific issue, there is nothing anomalib can do unfortunately. I will therefore convert this to Q&A in discussions. Feel free to convert it to an issue if you have a similar behaviour when you try other algorithms that are not memory bank-based. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Describe the bug
Hi,
when trying to train on a custom dataset, around halfway through the epoch I get this CPU memory error:
RuntimeError: [enforce fail at alloc_cpu.cpp:75] err == 0. DefaultCPUAllocator: can't allocate memory: you tried to allocate 35896320000 bytes. Error code 12 (Cannot allocate memory)
If I try to decrease batch size it just happens slower, you can see how the memory allocation grows during the training epoch without reducing back.
Now I don't want to decrease the dataset size. If anything I want to increase it.
I would appreciate your help.
Thanks ahead,
Yahav
this is the datamodule initialization"
folder_datamodule = Folder( root=folder_dataset_root, normal_dir="anomalib_ds/clean", abnormal_dir="anomalib_ds/obs", mask_dir=folder_dataset_root + "/anomalib_ds/mask", task="segmentation", image_size=IMG_SIZE, normalization=InputNormalizationMethod.NONE, # don't apply normalization, as we want to visualize the images eval_batch_size=16, train_batch_size=16 )
Dataset
Folder
Model
PADiM
Steps to reproduce the behavior
run train on big enough custom dataset
OS information
OS information:
Expected behavior
Memory consumption rises at the beginning of every step and decreases at the end of it.
Screenshots
No response
Pip/GitHub
GitHub
What version/branch did you use?
No response
Configuration YAML
Logs
Code of Conduct
Beta Was this translation helpful? Give feedback.
All reactions