Dataset load_from_disk is too slow

@lhoestq 
## Describe the bug
It's not normal that I have to wait 7-8 hours for a dataset to be loaded from disk, as there are no preprocessing steps, it's only loading it with load_from_disk. I have 96 cpus, however only 1 is used for this, which is inefficient. Moreover, its usage is at 1%... This is happening in the context of a language model training, therefore I'm wasting 100$ each time I have to load the dataset from disk again (because the spot instance was stopped by aws and I need to relaunch it for example). 

## Steps to reproduce the bug
Just get the oscar in  spanish (around 150GGB) and try to first save in disk and then load the processed dataset. It's not dependent on the task you're doing, it just depends on the size of the text dataset.

## Expected results
I expect the dataset to be loaded in a normal time, by using the whole machine for loading it, I mean if you store the dataset in multiple files (.arrow) and then load it from multiple files, you can use multiprocessing for that and therefore don't waste so much time. 


## Environment info

- `datasets` version: 1.8.0
- Platform: Ubuntu 18
- Python version: 3.8


I've seen you're planning to include a streaming mode for load_dataset, but that only saves the downloading and processing time, that's not being a problem for me, you cannot save the pure loading from disk time, therefore that's not a solution for my use case or for anyone who wants to use your library for training a language model.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dataset load_from_disk is too slow #2547

Describe the bug

Steps to reproduce the bug

Expected results

Environment info

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Dataset load_from_disk is too slow #2547

Description

Describe the bug

Steps to reproduce the bug

Expected results

Environment info

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions