-
Notifications
You must be signed in to change notification settings - Fork 29
Description
Have you read the Contributing Guidelines?
- I have read the Contributing Guidelines.
Issue Type
Missing files or data
Dataset Label
naturalistic
Dataset Split
dev and test
Affected Files/Data and Issue Description
After downloading the whole Dataset using offical script, I found that some archieves needed in the assets/filelist.cvs were missing. To be specific, batch 0 in Dev and Test of Naturalistic. The details can be seen in the picture, these archieves are indeed not in the Hugging Face(I've manually checked).
Also, I have some related questions. (1) For many batches among, the last archieve of batch in HuggingFce is typically smaller than others and why they are not been downloaded? (2)And For improvised/dev/0000 batch, it has some extra archieves in the HuggingFace and why are they not beed download and used? (3) Finally, the extra folder in some splits are also not been used, I think a thorough explaination will be a benefit for people to use this dataset. Thanks!!
Steps to Reproduce
Download the whole dataset using the offical script, download_whole_dataset() in scripts/download_hf.py specifically.
Additional Context
No response
Self-service
- I'd be willing to help investigate this data issue. Add comment