Skip to content

Dataset Missing For Some Nauturalistic Dev/Test #25

@WJohnnyW

Description

@WJohnnyW

Have you read the Contributing Guidelines?

Issue Type

Missing files or data

Dataset Label

naturalistic

Dataset Split

dev and test

Affected Files/Data and Issue Description

After downloading the whole Dataset using offical script, I found that some archieves needed in the assets/filelist.cvs were missing. To be specific, batch 0 in Dev and Test of Naturalistic. The details can be seen in the picture, these archieves are indeed not in the Hugging Face(I've manually checked).

Image

Also, I have some related questions. (1) For many batches among, the last archieve of batch in HuggingFce is typically smaller than others and why they are not been downloaded? (2)And For improvised/dev/0000 batch, it has some extra archieves in the HuggingFace and why are they not beed download and used? (3) Finally, the extra folder in some splits are also not been used, I think a thorough explaination will be a benefit for people to use this dataset. Thanks!!

Steps to Reproduce

Download the whole dataset using the offical script, download_whole_dataset() in scripts/download_hf.py specifically.

Additional Context

No response

Self-service

  • I'd be willing to help investigate this data issue. Add comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions