Speed per iteration slower when dataset is larger #4142

Eddowesselink · 2022-04-18T19:49:51Z

Eddowesselink
Apr 18, 2022

Dear contributors,

Hopefully you can help me out with the following situation:

We are training multiple models for segmentation of muscles using 3D MR images in Monai. When training on the entire dataset (n=50) the total training time is around 8 hours for 30k of iterations.

However, we will also investigate the importance of data augmentation, and have generated a dataset with n=1000 augmented subjects, which we will use for training a CNN. However, it appears that the total training time is 30 hours for 30k of iterations; with the same parameters as the CNN trained on a dataset 'without data augmentation (n=50; 8 hours of total training time).

I've tried different set-ups that I came across (not shuffling while dataloading, pin_memory false/true with non_blocking = true, different datasetclasses (persistent dataset, CacheNTransDataset, normal dataset), different num_workers in data_loader). However, yet no satisfying results.

Can anyone explain why training is slower when the dataset is larger? And is there anything else that I can implement to increase training speed?

Thank you in advance.

Eddo Wesselink

Nic-Ma · 2022-04-19T03:57:11Z

Nic-Ma
Apr 19, 2022
Maintainer

Hi @Eddowesselink ,

May I know how you evaluate a training "completed"? To achieve the same target validation metrics?
If yes, I think maybe because here your data augmentation didn't enhance much "difference" to help model converge?
@ericspod @dongyang0122 can help share experience from research tuning angle.
And to accelerate a regular training, I think you can try from CacheDataset, then GPU transforms & ThreadDataLoader if you have powerful GPUs. For more details:
https://github.com/Project-MONAI/tutorials/blob/master/acceleration/fast_model_training_guide.md

Thanks in advance.

0 replies

Eddowesselink · 2022-04-19T06:39:15Z

Eddowesselink
Apr 19, 2022
Author

Hi @Nic-Ma.

Thanks for your response and help!

We stop training after 30.000 iterations and evaluate the model for each 500 iterations to an independent testing dataset (n=26). The training dataset is randomly splitted using k-fold cross-validation (Sklearn) in 3 training datasets and 3 independent testing datasets. The augmented subjects are generated three times corresponding to the same training dataset (Fold 1 augmented dataset is generated from Fold 1 training data).

To our current knowledge, it is true that training on the augmented dataset didn't enhance much difference in model convergence in our case. However, the training speed (seconds per iteration) for the models using a large dataset (with the n=1000 augmented subjects) is much slower compared to the models trained on a smaller dataset (n=50, with no augmented subjects).

The models trained on larger datasets in this case took ~ 7.5 seconds per iteration, and the models trained on a smaller dataset took ~ 2.5 seconds per iteration. We've used the PersistentDataset to cache all the non-randomizable transforms and used the ThreadDataloader with n_workers of 0 (have also tried normal DataLoader, and tried different n_workers; with and without pinned memory). Caching the dataset to GPU first using CacheDataset and ToDeviceD transform wasn't possible here because of GPU capacity.

The only way for us, untill now, is to increase training speed per iteration by reducing the amount of training data. But I yet don't have an intuitive understanding why DataLoading is much slower in larger datasets compared to smaller datasets (with same model parameters, batch size etc).

Thank you in advance!

Kind regards,
Eddo

0 replies

ericspod · 2022-04-19T15:18:15Z

ericspod
Apr 19, 2022
Maintainer

Do you have a breakdown of what's contributing to your iteration time? Within that time you're loading data into a batch, performing the forward pass, calculating loss, backpropagating, then optimising. Any of these may vary in the amount of time taken so it's important to inspect these to see which is the culprit. Data loading could take longer because your augmented dataset is larger and less of it fits into memory at once, the OS is responsible for precaching some files into memory at times and if there are more files that are accessed at random there will be more cache misses.

Be sure to check that your augmented images are definitely the same format as the originals, that is checking that the dimensions are the same and the dtype is the same, I've made the mistake in the past of generating images that got stored as float64 instead of uint16. One other thing to be sure of is that the images are being stored in the same way as the original, that is in the same axis ordering with the same byte ordering, changing either of these can affect how fast loading can be.

Depending on your loss function it could take longer to calculate the loss if the augmented images are used because the problem is now harder essentially, same with certain optimisers. Getting that breakdown of times is important in figuring out if this is occurring but it seems more likely how you're loading data is a bigger contributor.

0 replies

Eddowesselink · 2022-04-21T07:04:57Z

Eddowesselink
Apr 21, 2022
Author

Hi @ericspod,

Thank you for you quick response and help! Very helpfull!

I found that I used the default parameters within SaveImageD; not saving the images in int16. However, generating a new dataset with int16 output dtype gave me the same problems.

I've generated multiple datasets and compared it with my 'normal dataset'. For what I can see is: generating a data augmentation dataset with same sample size (50 augmented training subjects) gives me equal training speed per iteration to the normal training dataset (see output below). Total time within for-loop (backward, loss, optimizing etc) is equal for both the datasets with 1000 and 50 training subjects. So it appears to be that the problem is not in the 'augmented images', but in 'starting' the next iteration and epoch in the training loop.

Original dataset with int16 and n=50 training subjects
Training (50 / 30010 Steps) (loss=2.50775): 100%|██████████| 1/1
iteration_time: 4.11s/it
total time within for-loop: 0.2186s/it

Data augmentation with int16 and n=50
Training (50 / 30010 Steps) (loss=2.51046): 100%|██████████| 1/1
iteration time = 4.05s/it
total time within for loop: 0.2187s/it

Data augmentation with Int16 with n=1000
Training (1507 / 30010 Steps) (loss=1.16002): 40%|████ | 8/20
iteration_time: 34.81s/it
total time within loop : 0.2294s/it

Thank you in advance. Happy to share my code when needed.

Regards,
Eddo

3 replies

ericspod Apr 21, 2022
Maintainer

We might have to look at your code to see what's going on. In your data set I'm guessing you're filling it with 50 filenames for the small augmented test? One experiment to try is to fill it with 20 copies of each filename so that it has 1000 files to access but there is still only 50 files on the disk. This would isolate a speed problem with the data loading from a disk access issue that's lower level.

If you're using PersistentDataset it might just be filling up the disk with too much data, that is too many small files for the things it's storing, if you haven't already try the other datasets like SmartCacheDataset to see what difference that makes for the original vs the large augmented dataset.

Eddowesselink Apr 21, 2022
Author

Hi @ericspod ,

You saved my day. Many thanks for helping me out.

It was all in the RAM usage indeed. Using smaller caches (50 in this case) using SmartCacheDataset with multiple threads (2 in this case) for init and replace workers gave me satisfying training speed (completing training from aprox 12 days to 30 hours).

Again, many thanks for helping me out and providing me feedback on I/O usage.

Kind regards,
Eddo

ericspod Apr 22, 2022
Maintainer

Glad to help!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Speed per iteration slower when dataset is larger #4142

Uh oh!

{{title}}

Uh oh!

Replies: 4 comments 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Speed per iteration slower when dataset is larger #4142

Uh oh!

Eddowesselink Apr 18, 2022

Replies: 4 comments · 3 replies

Uh oh!

Nic-Ma Apr 19, 2022 Maintainer

Uh oh!

Eddowesselink Apr 19, 2022 Author

Uh oh!

ericspod Apr 19, 2022 Maintainer

Uh oh!

Eddowesselink Apr 21, 2022 Author

Uh oh!

ericspod Apr 21, 2022 Maintainer

Uh oh!

Uh oh!

Eddowesselink Apr 21, 2022 Author

Uh oh!

ericspod Apr 22, 2022 Maintainer

Eddowesselink
Apr 18, 2022

Replies: 4 comments 3 replies

Nic-Ma
Apr 19, 2022
Maintainer

Eddowesselink
Apr 19, 2022
Author

ericspod
Apr 19, 2022
Maintainer

Eddowesselink
Apr 21, 2022
Author

ericspod Apr 21, 2022
Maintainer

Eddowesselink Apr 21, 2022
Author

ericspod Apr 22, 2022
Maintainer