-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Open
Labels
enhancementNew feature or requestNew feature or requestgood second issueIssues a bit more difficult than "Good First" issuesIssues a bit more difficult than "Good First" issues
Description
The idea would be to allow something like
ds = load_dataset("c4", "en", as_iterable=True)
To be used to train models. It would load an IterableDataset from the cached Arrow files.
Cc @stas00
Edit : from the discussions we may load from cache when streaming=True
stas00, severo, albertvillanova, npuichigo and RmZeta2718
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestgood second issueIssues a bit more difficult than "Good First" issuesIssues a bit more difficult than "Good First" issues