You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* passes all but 1 test case
* Migrated Audio feature to use torchcodec as a backend. Fixed how formatter handles torchcodec objects. Fixed test scripts to work with new Audio backend
* fixed audio and video features so they now pass the test_dataset_with_audio_feature_map_is_decoded test case. Implemented casting for VideoDecoder and AudioDecoder types
* added load dataset test case to test_video.py
* Modified documentation to document new torchcodec implementation of Video and Audio features. Fixed the the rest of the test files to be compatible with new Audio and Video features.
* code formatting for torchcodec changes
* Update src/datasets/features/audio.py
Co-authored-by: Quentin Lhoest <[email protected]>
* added backwards compatibility support and _hf_encoded for Audio feature.
* move AudioDecoder to its own file
* naming
* docs
* style
* update tests
* no torchcodec for windows
* further cleaning
* fix
* install ffmpeg in ci
* fix ffmpeg installation
* fix mono backward compatibility
* fix ffmpeg
* again
* fix mono backward compat
* fix tests
* fix tests
* again
---------
Co-authored-by: Quentin Lhoest <[email protected]>
Co-authored-by: Quentin Lhoest <[email protected]>
Copy file name to clipboardExpand all lines: docs/source/about_dataset_features.mdx
+8-11Lines changed: 8 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -53,7 +53,7 @@ See the [flatten](./process#flatten) section to learn how you can extract the ne
53
53
54
54
</Tip>
55
55
56
-
The array feature type is useful for creating arrays of various sizes. You can create arrays with two dimensions using [`Array2D`], and even arrays with five dimensions using [`Array5D`].
56
+
The array feature type is useful for creating arrays of various sizes. You can create arrays with two dimensions using [`Array2D`], and even arrays with five dimensions using [`Array5D`].
57
57
58
58
```py
59
59
>>> features = Features({'a': Array2D(shape=(1, 3), dtype='int32')})
@@ -69,9 +69,9 @@ The array type also allows the first dimension of the array to be dynamic. This
69
69
70
70
Audio datasets have a column with type [`Audio`], which contains three important fields:
71
71
72
-
*`array`: the decoded audio data represented as a 1-dimensional array.
73
-
*`path`: the path to the downloaded audio file.
74
-
*`sampling_rate`: the sampling rate of the audio data.
72
+
-`array`: the decoded audio data represented as a 1-dimensional array.
73
+
-`path`: the path to the downloaded audio file.
74
+
-`sampling_rate`: the sampling rate of the audio data.
75
75
76
76
When you load an audio dataset and call the audio column, the [`Audio`] feature automatically decodes and resamples the audio file:
77
77
@@ -80,10 +80,7 @@ When you load an audio dataset and call the audio column, the [`Audio`] feature
<datasets.features._torchcodec.AudioDecoder object at 0x11642b6a0>
87
84
```
88
85
89
86
<Tipwarning={true}>
@@ -92,7 +89,7 @@ Index into an audio dataset using the row index first and then the `audio` colum
92
89
93
90
</Tip>
94
91
95
-
With `decode=False`, the [`Audio`] type simply gives you the path or the bytes of the audio file, without decoding it into an `array`,
92
+
With `decode=False`, the [`Audio`] type simply gives you the path or the bytes of the audio file, without decoding it into an torchcodec `AudioDecoder` object,
@@ -146,4 +143,4 @@ You can also define a dataset of images from numpy arrays:
146
143
And in this case the numpy arrays are encoded into PNG (or TIFF if the pixels values precision is important).
147
144
148
145
For multi-channels arrays like RGB or RGBA, only uint8 is supported. If you use a larger precision, you get a warning and the array is downcasted to uint8.
149
-
For gray-scale images you can use the integer or float precision you want as long as it is compatible with `Pillow`. A warning is shown if your image integer or float precision is too high, and in this case the array is downcated: an int64 array is downcasted to int32, and a float64 array is downcasted to float32.
146
+
For gray-scale images you can use the integer or float precision you want as long as it is compatible with `Pillow`. A warning is shown if your image integer or float precision is too high, and in this case the array is downcated: an int64 array is downcasted to int32, and a float64 array is downcasted to float32.
There are several methods for creating and sharing an audio dataset:
12
12
13
-
* Create an audio dataset from local files in python with [`Dataset.push_to_hub`]. This is an easy way that requires only a few steps in python.
14
-
15
-
* Create an audio dataset repository with the `AudioFolder` builder. This is a no-code solution for quickly creating an audio dataset with several thousand audio files.
13
+
- Create an audio dataset from local files in python with [`Dataset.push_to_hub`]. This is an easy way that requires only a few steps in python.
16
14
15
+
- Create an audio dataset repository with the `AudioFolder` builder. This is a no-code solution for quickly creating an audio dataset with several thousand audio files.
17
16
18
17
<Tip>
19
18
@@ -28,10 +27,7 @@ You can load your own dataset using the paths to your audio files. Use the [`~Da
<datasets.features._torchcodec.AudioDecoder object at 0x11642b6a0>
35
31
```
36
32
37
33
Then upload the dataset to the Hugging Face Hub using [`Dataset.push_to_hub`]:
@@ -51,7 +47,6 @@ my_dataset/
51
47
52
48
## AudioFolder
53
49
54
-
55
50
The `AudioFolder` is a dataset builder designed to quickly load an audio dataset with several thousand audio files without requiring you to write any code.
56
51
57
52
<Tip>
@@ -101,7 +96,6 @@ If all audio files are contained in a single directory or if they are not on the
101
96
102
97
</Tip>
103
98
104
-
105
99
If there is additional information you'd like to include about your dataset, like text captions or bounding boxes, add it as a `metadata.csv` file in your folder. This lets you quickly create datasets for different computer vision tasks like text captioning or object detection. You can also use a JSONL file `metadata.jsonl` or a Parquet file `metadata.parquet`.
Copy file name to clipboardExpand all lines: docs/source/audio_load.mdx
+2-6Lines changed: 2 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,18 +8,14 @@ Audio decoding is based on the [`soundfile`](https://github.com/bastibe/python-s
8
8
To work with audio datasets, you need to have the `audio` dependencies installed.
9
9
Check out the [installation](./installation#audio) guide to learn how to install it.
10
10
11
-
12
11
## Local files
13
12
14
13
You can load your own dataset using the paths to your audio files. Use the [`~Dataset.cast_column`] function to take a column of audio file paths, and cast it to the [`Audio`] feature:
<datasets.features._torchcodec.AudioDecoder object at 0x11642b6a0>
23
19
```
24
20
25
21
## AudioFolder
@@ -99,7 +95,7 @@ For a guide on how to load any type of dataset, take a look at the <a class="und
99
95
100
96
## Audio decoding
101
97
102
-
By default, audio files are decoded sequentially as NumPy arrays when you iterate on a dataset.
98
+
By default, audio files are decoded sequentially as torchcodec [`AudioDecoder`](https://docs.pytorch.org/torchcodec/stable/generated/torchcodec.decoders.AudioDecoder.html#torchcodec.decoders.AudioDecoder) objects when you iterate on a dataset.
103
99
However it is possible to speed up the dataset significantly using multithreaded decoding:
Copy file name to clipboardExpand all lines: docs/source/audio_process.mdx
+30-21Lines changed: 30 additions & 21 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,7 +7,6 @@ This guide shows specific methods for processing audio datasets. Learn how to:
7
7
8
8
For a guide on how to process any type of dataset, take a look at the <aclass="underline decoration-sky-400 decoration-2 font-semibold"href="./process">general process guide</a>.
9
9
10
-
11
10
## Cast
12
11
13
12
The [`~Dataset.cast_column`] function is used to cast a column to another feature to be decoded. When you use this function with the [`Audio`] feature, you can resample the sampling rate:
@@ -22,16 +21,26 @@ The [`~Dataset.cast_column`] function is used to cast a column to another featur
22
21
Audio files are decoded and resampled on-the-fly, so the next time you access an example, the audio file is resampled to 16kHz:
Copy file name to clipboardExpand all lines: docs/source/create_dataset.mdx
+6-6Lines changed: 6 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,8 +4,8 @@ Sometimes, you may need to create a dataset if you're working with your own data
4
4
5
5
In this tutorial, you'll learn how to use 🤗 Datasets low-code methods for creating all types of datasets:
6
6
7
-
* Folder-based builders for quickly creating an image or audio dataset
8
-
*`from_` methods for creating datasets from local files
7
+
- Folder-based builders for quickly creating an image or audio dataset
8
+
-`from_` methods for creating datasets from local files
9
9
10
10
## File-based builders
11
11
@@ -24,10 +24,10 @@ To get the list of supported formats and code examples, follow this guide [here]
24
24
25
25
There are two folder-based builders, [`ImageFolder`] and [`AudioFolder`]. These are low-code methods for quickly creating an image or speech and audio dataset with several thousand examples. They are great for rapidly prototyping computer vision and speech models before scaling to a larger dataset. Folder-based builders takes your data and automatically generates the dataset's features, splits, and labels. Under the hood:
26
26
27
-
*[`ImageFolder`] uses the [`~datasets.Image`] feature to decode an image file. Many image extension formats are supported, such as jpg and png, but other formats are also supported. You can check the complete [list](https://github.com/huggingface/datasets/blob/b5672a956d5de864e6f5550e493527d962d6ae55/src/datasets/packaged_modules/imagefolder/imagefolder.py#L39) of supported image extensions.
28
-
*[`AudioFolder`] uses the [`~datasets.Audio`] feature to decode an audio file. Audio extensions such as wavand mp3 are supported, and you can check the complete [list](https://github.com/huggingface/datasets/blob/b5672a956d5de864e6f5550e493527d962d6ae55/src/datasets/packaged_modules/audiofolder/audiofolder.py#L39) of supported audio extensions.
27
+
-[`ImageFolder`] uses the [`~datasets.Image`] feature to decode an image file. Many image extension formats are supported, such as jpg and png, but other formats are also supported. You can check the complete [list](https://github.com/huggingface/datasets/blob/b5672a956d5de864e6f5550e493527d962d6ae55/src/datasets/packaged_modules/imagefolder/imagefolder.py#L39) of supported image extensions.
28
+
-[`AudioFolder`] uses the [`~datasets.Audio`] feature to decode an audio file. Extensions such as wav, mp3, and even mp4 are supported, and you can check the complete [list](https://ffmpeg.org/ffmpeg-formats.html) of supported audio extensions. Decoding is done via ffmpeg.
29
29
30
-
The dataset splits are generated from the repository structure, and the label names are automatically inferred from the directory name.
30
+
The dataset splits are generated from the repository structure, and the label names are automatically inferred from the directory name.
31
31
32
32
For example, if your image dataset (it is the same for an audio dataset) is stored like this:
Copy file name to clipboardExpand all lines: docs/source/installation.md
+1-13Lines changed: 1 addition & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -30,7 +30,7 @@ You should install 🤗 Datasets in a [virtual environment](https://docs.python.
30
30
```bash
31
31
# Activate the virtual environment
32
32
source .env/bin/activate
33
-
33
+
34
34
# Deactivate the virtual environment
35
35
source .env/bin/deactivate
36
36
```
@@ -65,18 +65,6 @@ To work with audio datasets, you need to install the [`Audio`] feature as an ext
65
65
pip install datasets[audio]
66
66
```
67
67
68
-
<Tipwarning={true}>
69
-
70
-
To decode mp3 files, you need to have at least version 1.1.0 of the `libsndfile` system library. Usually, it's bundled with the python [`soundfile`](https://github.com/bastibe/python-soundfile) package, which is installed as an extra audio dependency for 🤗 Datasets.
71
-
For Linux, the required version of `libsndfile` is bundled with `soundfile` starting from version 0.12.0. You can run the following command to determine which version of `libsndfile` is being used by `soundfile`:
0 commit comments