Skip to content

Conversation

@negativenagesh
Copy link

Feel free to give suggestions please..

This PR is raised because of issue: #7080

image

This PR gives solution to #7080

  1. Checking whether the dataset has splits or subdatasets.
  2. Printing the available splits/subdatasets.
  3. Asking the user to choose which one to load.
  4. Loading only the selected dataset based on the user's input.

Key Changes:

  1. Available Splits/Subdatasets: The code checks for available splits/subdatasets using builder_instance.info.splits.keys().
  2. User Prompt: If splits are found, it prints them out and prompts the user to select one.
  3. Loading Based on User Input: The dataset is loaded based on the user's choice.

This way, the dataset loading function will interactively prompt the user to select which subdataset or split they want to load instead of automatically loading all of them.

@negativenagesh negativenagesh changed the title Solution to issue: https://github.com/huggingface/datasets/issues/7080 :Modified load_dataset function, so that it prompts the user to select a dataset when subdatasets or splits (train, test) are available Solution to issue: #7080 Modified load_dataset function, so that it prompts the user to select a dataset when subdatasets or splits (train, test) are available Oct 2, 2024
@lhoestq
Copy link
Member

lhoestq commented Oct 8, 2024

I think the approach presented in #6832 is the one we'll be taking.

Asking user input is not a good idea since load_dataset is used a lot in server that don't have someone in front of them to select a split

@negativenagesh negativenagesh closed this by deleting the head repository Nov 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants