The child process retrieves the dataset directly from the main process instead of executing `memory_mapped_arrow_table_from_file`.

### Feature request

The child process retrieves the dataset directly from the main process instead of executing `memory_mapped_arrow_table_from_file`.

### Motivation

Because my local disk space is insufficient, I can only store a dataset on a remote Ceph server and process it using datasets.
I used the data-juicer[https://github.com/datajuicer/data-juicer] framework as an outer layer which uses datasets, but it doesn't support streaming datasets. I then encountered a problem: for each load, map, and filter operation, I had to wait for a large number of child processes to execute `memory_mapped_arrow_table_from_file`. Since the actual file was on the remote Ceph server, this operation was limited by network I/O.
I don't know if it's a problem with my usage or if this is how datasets are currently designed.However, I think that if the instances obtained after datasets.load_datasets are directly passed to the child process instead of re-executing `memory_mapped_arrow_table_from_file`, it might solve my problem.Or datasets already support this capability, but I just didn't know it？

### Your contribution

。。。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

The child process retrieves the dataset directly from the main process instead of executing `memory_mapped_arrow_table_from_file`. #7902

Feature request

Motivation

Your contribution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

The child process retrieves the dataset directly from the main process instead of executing memory_mapped_arrow_table_from_file. #7902

Description

Feature request

Motivation

Your contribution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

The child process retrieves the dataset directly from the main process instead of executing `memory_mapped_arrow_table_from_file`. #7902