Comparision between different Datasets #2457
Jingnan-Jia
started this conversation in
General
Replies: 1 comment
-
|
Hi @Jingnan-Jia , Thanks for your interest here. Thanks. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
At first, thanks very much for the development of MONAI. I love it!
I found MONAI provided
CacheDataset,LMDBDataset,PersistentDataset, andSmartCachefor the acceleration of data loading and transform.My understanding of the 4 Datasets:
If I have 1000 3D CT scans for training,
CacheDatasetwill cachecache_numcases in memory before random transforms for training. So the data loading will be very fast for the firstcache_numtraining samples and become slower for the rest1000-cache_numsamples in each epoch. The disadvantage is it require a lot of cpu memory if we want to cache all 1000 3D CT scans in one shot.SmartCachewill cachecache_numcases in memory before random transforms for training. But part of thecache_numcases will be replaced before next epoch according toreplace_rate. So the data loading speed is stable for each batch of training data. This one seems better if we can not cache all 1000 3D CT scans in one shot.PersistentDatasetwill saveallcases in disk before random transforms. And load them again during random transforms. It will still load data from disk, but the loading time would be shorter because loading tensors seems faster than loading medical images.LMDBDatasetbecause I do not have the experience of LMDB database.My question is:
Among the last 3 Datasets, do you have a recommendation on which Dataset is the fastest one?
SmartCache,PersistentDatasetorLMDBDataset?Beta Was this translation helpful? Give feedback.
All reactions