Extraction protocol for arrow files is not defined

### Describe the bug

Passing files with `.arrow` extension into data_files argument, at least when `streaming=True` is very slow.

### Steps to reproduce the bug

 Basically it goes through the `_get_extraction_protocol` method located [here](https://github.com/huggingface/datasets/blob/main/src/datasets/utils/file_utils.py#L820)
The method then looks at some base known extensions where `arrow` is not defined so it proceeds to determine the compression with the magic number method which is slow when dealing with a lot of files which are stored in s3 and by looking at this predefined list, I don't see `arrow` in there either so in the end it return None:

```
MAGIC_NUMBER_TO_COMPRESSION_PROTOCOL = {
    bytes.fromhex("504B0304"): "zip",
    bytes.fromhex("504B0506"): "zip",  # empty archive
    bytes.fromhex("504B0708"): "zip",  # spanned archive
    bytes.fromhex("425A68"): "bz2",
    bytes.fromhex("1F8B"): "gzip",
    bytes.fromhex("FD377A585A00"): "xz",
    bytes.fromhex("04224D18"): "lz4",
    bytes.fromhex("28B52FFD"): "zstd",
}
```

### Expected behavior

My expectation is that `arrow` would be in the known lists so it would return None without going through the magic number method.

### Environment info

datasets 2.19.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extraction protocol for arrow files is not defined #6905

Describe the bug

Steps to reproduce the bug

Expected behavior

Environment info

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Extraction protocol for arrow files is not defined #6905

Description

Describe the bug

Steps to reproduce the bug

Expected behavior

Environment info

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions