Skip to content

huggingface dataset load error #4

@heyongxin233

Description

@heyongxin233

I got an error when loading the data set using huggingface, as follows:

datasets.exceptions.DatasetGenerationCastError: An error occurred while generating the dataset

All the data files must have the same columns, but at some point there are 2 new columns ({'split', 'index'})

This happened while the json dataset builder was generating data using

hf://datasets/ZachW/MGTDetect_CoCo/gpt3.5-davinci3/gpt3.5-Mixed-davinci3/gpt3.5_mixed_1000_train.jsonl (at revision aa49f92a8667f5a704ff576c728765c236940c6c)

Please either edit the data files to have matching columns, or separate them into different configurations (see docs at https://hf.co/docs/hub/datasets-manual-configuration#multiple-configurations)

My code:

from datasets import load_dataset
dataset = load_dataset("ZachW/MGTDetect_CoCo")

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions