Speed per iteration slower when dataset is larger #4142
Replies: 4 comments 3 replies
-
Hi @Eddowesselink , May I know how you evaluate a training "completed"? To achieve the same target validation metrics? Thanks in advance. |
Beta Was this translation helpful? Give feedback.
-
Hi @Nic-Ma. Thanks for your response and help! We stop training after 30.000 iterations and evaluate the model for each 500 iterations to an independent testing dataset (n=26). The training dataset is randomly splitted using k-fold cross-validation (Sklearn) in 3 training datasets and 3 independent testing datasets. The augmented subjects are generated three times corresponding to the same training dataset (Fold 1 augmented dataset is generated from Fold 1 training data). To our current knowledge, it is true that training on the augmented dataset didn't enhance much difference in model convergence in our case. However, the training speed (seconds per iteration) for the models using a large dataset (with the n=1000 augmented subjects) is much slower compared to the models trained on a smaller dataset (n=50, with no augmented subjects). The models trained on larger datasets in this case took ~ 7.5 seconds per iteration, and the models trained on a smaller dataset took ~ 2.5 seconds per iteration. We've used the PersistentDataset to cache all the non-randomizable transforms and used the ThreadDataloader with n_workers of 0 (have also tried normal DataLoader, and tried different n_workers; with and without pinned memory). Caching the dataset to GPU first using CacheDataset and ToDeviceD transform wasn't possible here because of GPU capacity. The only way for us, untill now, is to increase training speed per iteration by reducing the amount of training data. But I yet don't have an intuitive understanding why DataLoading is much slower in larger datasets compared to smaller datasets (with same model parameters, batch size etc). Thank you in advance! Kind regards, |
Beta Was this translation helpful? Give feedback.
-
Do you have a breakdown of what's contributing to your iteration time? Within that time you're loading data into a batch, performing the forward pass, calculating loss, backpropagating, then optimising. Any of these may vary in the amount of time taken so it's important to inspect these to see which is the culprit. Data loading could take longer because your augmented dataset is larger and less of it fits into memory at once, the OS is responsible for precaching some files into memory at times and if there are more files that are accessed at random there will be more cache misses. Be sure to check that your augmented images are definitely the same format as the originals, that is checking that the dimensions are the same and the dtype is the same, I've made the mistake in the past of generating images that got stored as float64 instead of uint16. One other thing to be sure of is that the images are being stored in the same way as the original, that is in the same axis ordering with the same byte ordering, changing either of these can affect how fast loading can be. Depending on your loss function it could take longer to calculate the loss if the augmented images are used because the problem is now harder essentially, same with certain optimisers. Getting that breakdown of times is important in figuring out if this is occurring but it seems more likely how you're loading data is a bigger contributor. |
Beta Was this translation helpful? Give feedback.
-
Hi @ericspod, Thank you for you quick response and help! Very helpfull! I found that I used the default parameters within SaveImageD; not saving the images in int16. However, generating a new dataset with int16 output dtype gave me the same problems. I've generated multiple datasets and compared it with my 'normal dataset'. For what I can see is: generating a data augmentation dataset with same sample size (50 augmented training subjects) gives me equal training speed per iteration to the normal training dataset (see output below). Total time within for-loop (backward, loss, optimizing etc) is equal for both the datasets with 1000 and 50 training subjects. So it appears to be that the problem is not in the 'augmented images', but in 'starting' the next iteration and epoch in the training loop. Original dataset with int16 and n=50 training subjects Data augmentation with int16 and n=50 Data augmentation with Int16 with n=1000 Thank you in advance. Happy to share my code when needed. Regards, |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Dear contributors,
Hopefully you can help me out with the following situation:
We are training multiple models for segmentation of muscles using 3D MR images in Monai. When training on the entire dataset (n=50) the total training time is around 8 hours for 30k of iterations.
However, we will also investigate the importance of data augmentation, and have generated a dataset with n=1000 augmented subjects, which we will use for training a CNN. However, it appears that the total training time is 30 hours for 30k of iterations; with the same parameters as the CNN trained on a dataset 'without data augmentation (n=50; 8 hours of total training time).
I've tried different set-ups that I came across (not shuffling while dataloading, pin_memory false/true with non_blocking = true, different datasetclasses (persistent dataset, CacheNTransDataset, normal dataset), different num_workers in data_loader). However, yet no satisfying results.
Can anyone explain why training is slower when the dataset is larger? And is there anything else that I can implement to increase training speed?
Thank you in advance.
Eddo Wesselink
Beta Was this translation helpful? Give feedback.
All reactions