-
Notifications
You must be signed in to change notification settings - Fork 133
Closed
Description
For the latest RETURNN, when I use DistributeFilesDataset, I have this error.
File "/nas/models/asr/am/multilingual/16kHz/2024-11-08--jxu-best-rq-pretrain/work/i6_core/tools/git/CloneGitRepositoryJob.LD5f1wKK7LPo/output/returnn/returnn/datasets/basic.py", line 227, in Dataset._create_from_reduce
line: ds = cls(**kwargs)
locals:
ds = <not found>
cls = <local> <class 'returnn.datasets.distrib_files.DistributeFilesDataset'>
kwargs = <local> {'files': ['/ssd/jxu/nas/data/speech/FR_FR/16kHz/EPPS/corpus/batch.1.v1/hdf-raw_wav.16kHz.split-25/EPPS-batch.1.v1.hdf.15', '/ssd/jxu/nas/data/speech/EN_US/16kHz/NEWS.HQ/corpus/batch.2.NPR.v3/hdf-raw_wav.16kHz.split-261/NEWS.HQ-batch.2.NP
R.v3.hdf.7', '/ssd/jxu/nas/data/speech/IT_IT/16kHz/IT.parli..., len = 25
File "/nas/models/asr/am/multilingual/16kHz/2024-11-08--jxu-best-rq-pretrain/work/i6_core/tools/git/CloneGitRepositoryJob.LD5f1wKK7LPo/output/returnn/returnn/datasets/distrib_files.py", line 171, in DistributeFilesDataset.__init__
line: assert self._num_shards == 1 and self._shard_index == 0, ( # ensure defaults are set
f"{self}: Cannot use both dataset-sharding via properties _num_shards and _shard index "
f"and {self.__class__.__name__}'s own sharding implementation based on the trainings rank and size."
)
locals:
self = <local> <DistributeFilesDataset 'train' epoch=None>
self._num_shards = <local> 8
self._shard_index = <local> 6
The DistributeFilesDataset is inherited from CachedDataset2, which is again inherited from Dataset, the the _num_shards should be set to 1 in the init function. I am not sure how self._num_shards is changed to num of gpus in my case.
(cc @NeoLegends, @michelwi)
Metadata
Metadata
Assignees
Labels
No labels