You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: AI-and-Analytics/End-to-end-Workloads/LanguageIdentification/README.md
+87-54Lines changed: 87 additions & 54 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -39,7 +39,7 @@ For both training and inference, you can run the sample and scripts in Jupyter N
39
39
40
40
## Prepare the Environment
41
41
42
-
### Downloading the CommonVoice Dataset
42
+
### Download the CommonVoice Dataset
43
43
44
44
>**Note**: You can skip downloading the dataset if you already have a pretrained model and only want to run inference on custom data samples that you provide.
45
45
@@ -79,33 +79,49 @@ tar -xf cv-corpus-11.0-2022-09-21-sv-SE.tar.gz
79
79
mv cv-corpus-11.0-2022-09-21 swedish
80
80
```
81
81
82
-
### Configuring the Container
82
+
### Create and Set Up Environment
83
83
84
-
1. Pull the `oneapi-aikit` docker image.
85
-
2. Set up the Docker environment.
86
-
```
87
-
docker pull intel/oneapi-aikit
88
-
./launch_docker.sh
89
-
```
90
-
>**Note**: By default, the `Inference` and `Training` directories will be mounted and the environment variable `COMMON_VOICE_PATH` will be set to `/data/commonVoice` and mounted to `/data`. `COMMON_VOICE_PATH` is the location of where the CommonVoice dataset is downloaded.
84
+
1. Create your conda environment by following the instructions on the Intel [AI Tools Selector](https://www.intel.com/content/www/us/en/developer/tools/oneapi/ai-tools-selector.html). You can follow these settings:
85
+
86
+
* AI Tools
87
+
* Preset: Inference Optimization
88
+
* Distribution Type: conda*
89
+
* Python Versions: Python* 3.9 or 3.10
90
+
91
+
Then activate your environment:
92
+
```bash
93
+
conda activate <your-env-name>
94
+
```
91
95
96
+
2. Set the environment variable `COMMON_VOICE_PATH`
97
+
```bash
98
+
export COMMON_VOICE_PATH=/data/commonVoice
99
+
```
100
+
101
+
3. Install packages needed for MP3 to WAV conversion
cd oneAPI-samples/AI-and-Analytics/End-to-end-Workloads/LanguageIdentification
110
+
```
92
111
112
+
5. Run the bash script to install additional necessary libraries, including SpeechBrain.
113
+
```bash
114
+
source initialize.sh
115
+
```
93
116
94
117
## Train the Model with Languages
95
118
96
119
This section explains how to train a model for language identification using the CommonVoice dataset, so it includes steps on how to preprocess the data, train the model, and prepare the output files for inference.
97
120
98
-
### Configure the Training Environment
99
-
100
-
1. Change to the `Training` directory.
101
-
```
102
-
cd /Training
103
-
```
104
-
2. Source the bash script to install the necessary components.
105
-
```
106
-
source initialize.sh
107
-
```
108
-
This installs PyTorch*, the Intel® Extension for PyTorch (IPEX), and other components.
121
+
First, change to the `Training` directory.
122
+
```
123
+
cd /Training
124
+
```
109
125
110
126
### Run in Jupyter Notebook
111
127
@@ -139,7 +155,7 @@ If you cannot or do not want to use Jupyter Notebook, use these procedures to ru
139
155
```
140
156
141
157
2. From the `Training` directory, apply patches to modify these files to work with the CommonVoice dataset.
142
-
```
158
+
```bash
143
159
patch < create_wds_shards.patch
144
160
patch < train_ecapa.patch
145
161
```
@@ -154,8 +170,8 @@ The `prepareAllCommonVoice.py` script performs the following data preprocessing
154
170
1. If you want to add additional languages, then modify the `LANGUAGE_PATHS` list in the file to reflect the languages to be included in the model.
155
171
156
172
2. Run the script with options. The samples will be divided as follows: 80% training, 10% validation, 10% testing.
3. Note the shard with the largest number as `LARGEST_SHARD_NUMBER` in the output above or by navigating to `/data/commonVoice_shards/train`.
191
+
3. Note the shard with the largest number as `LARGEST_SHARD_NUMBER` in the output above or by navigating to `${COMMON_VOICE_PATH}/processed_data/commonVoice_shards/train`.
176
192
4. Open the `train_ecapa.yaml` file and modify the `train_shards` variable to make the range reflect: `000000..LARGEST_SHARD_NUMBER`.
177
-
5. Repeat the process for `/data/commonVoice_shards/dev`.
193
+
5. Repeat Steps 3 and 4 for `${COMMON_VOICE_PATH}/processed_data/commonVoice_shards/dev`.
178
194
179
195
#### Run the Training Script
180
196
181
-
The YAML file `train_ecapa.yaml` with the training configurations should already be patched from the Prerequisite section.
197
+
The YAML file `train_ecapa.yaml` with the training configurations is passed as an argument to the `train.py` script to train the model.
182
198
183
199
1. If necessary, edit the `train_ecapa.yaml` file to meet your needs.
184
200
185
201
| Parameters | Description
186
202
|:--- |:---
203
+
| `seed` | The seed value, which should be set to a different value for subsequent runs. Defaults to 1987.
187
204
| `out_n_neurons` | Must be equal to the number of languages of interest.
188
205
| `number_of_epochs` | Default is **10**. Adjust as needed.
189
206
| `batch_size` | In the trainloader_options, decrease this value if your CPU or GPU runs out of memory while running the training script.
@@ -195,30 +212,48 @@ The YAML file `train_ecapa.yaml` with the training configurations should already
195
212
196
213
#### Move Model to Inference Folder
197
214
198
-
After training, the output should be inside `results/epaca/SEED_VALUE` folder. By default SEED_VALUE is set to 1987 in the YAML file. You can change the value as needed.
199
-
200
-
1. Copy all files with *cp -R* from `results/epaca/SEED_VALUE` into a new folder called `lang_id_commonvoice_model` in the **Inference** folder.
215
+
After training, the output should be inside the `results/epaca/1987` folder. By default the `seed` is set to 1987 in `train_ecapa.yaml`. You can change the value as needed.
201
216
202
-
The name of the folder MUST match with the pretrained_path variable defined in the YAML file. By default, it is `lang_id_commonvoice_model`.
217
+
1. Copy all files from `results/epaca/1987` into a new folder called `lang_id_commonvoice_model` in the **Inference** folder.
The name of the folder MUST match with the pretrained_path variable defined in `train_ecapa.yaml`. By default, it is `lang_id_commonvoice_model`.
203
222
204
223
2. Change directory to `/Inference/lang_id_commonvoice_model/save`.
224
+
```bash
225
+
cd ../Inference/lang_id_commonvoice_model/save
226
+
```
227
+
205
228
3. Copy the `label_encoder.txt` file up one level.
206
-
4. Change to the latest `CKPT` folder, and copy the classifier.ckpt and embedding_model.ckpt files into the `/Inference/lang_id_commonvoice_model/` folder.
229
+
```bash
230
+
cp label_encoder.txt ../.
231
+
```
232
+
233
+
4. Change to the latest `CKPT` folder, and copy the classifier.ckpt and embedding_model.ckpt files into the `/Inference/lang_id_commonvoice_model/` folder which is two directories up.
234
+
```bash
235
+
# Navigate into the CKPT folder
236
+
cd CKPT<DATE_OF_RUN>
207
237
208
-
You may need to modify the permissions of these files to be executable before you run the inference scripts to consume them.
238
+
cp classifier.ckpt ../../.
239
+
cp embedding_model.ckpt ../../
240
+
cd ../..
241
+
```
242
+
243
+
You may need to modify the permissions of these files to be executable i.e. `sudo chmod 755` before you run the inference scripts to consume them.
209
244
210
245
>**Note**: If `train.py` is rerun with the same seed, it will resume from the epoch number it last run. For a clean rerun, delete the `results` folder or change the seed.
211
246
212
247
You can now load the model for inference. In the `Inference` folder, the `inference_commonVoice.py` script uses the trained model on the testing dataset, whereas `inference_custom.py` uses the trained model on a user-specified dataset and can utilize Voice Activity Detection.
213
248
214
-
>**Note**: If the folder name containing the model is changed from `lang_id_commonvoice_model`, you will need to modify the `source_model_path` variable in `inference_commonVoice.py` and `inference_custom.py` files in the `speechbrain_inference` class.
249
+
>**Note**: If the folder name containing the model is changed from `lang_id_commonvoice_model`, you will need to modify the `pretrained_path` in `train_ecapa.yaml`, and the `source_model_path` variable in both the `inference_commonVoice.py` and `inference_custom.py` files in the `speechbrain_inference` class.
215
250
216
251
217
252
## Run Inference for Language Identification
218
253
219
254
>**Stop**: If you have not already done so, you must run the scripts in the `Training` folder to generate the trained model before proceeding.
220
255
221
-
To run inference, you must have already run all of the training scripts, generated the trained model, and moved files to the appropriate locations. You must place the model output in a folder name matching the name specified as the `pretrained_path` variable defined in the YAML file.
256
+
To run inference, you must have already run all of the training scripts, generated the trained model, and moved files to the appropriate locations. You must place the model output in a folder name matching the name specified as the `pretrained_path` variable defined in `train_ecapa.yaml`.
222
257
223
258
>**Note**: If you plan to run inference on **custom data**, you will need to create a folder for the **.wav** files to be used for prediction. For example, `data_custom`. Move the **.wav** files to your custom folder. (For quick results, you may select a few audio files from each language downloaded from CommonVoice.)
224
259
@@ -228,13 +263,9 @@ To run inference, you must have already run all of the training scripts, generat
228
263
```
229
264
cd /Inference
230
265
```
231
-
2. Source the bash script to install or update the necessary components.
232
-
```
233
-
source initialize.sh
266
+
2. Patch SpeechBrain's `interfaces.py`. This patch is required for PyTorch* TorchScript to work because the output of the model must contain only tensors.
234
267
```
235
-
3. Patch the Intel® Extension for PyTorch (IPEX) to use SpeechBrain models. (This patch is required for PyTorch* TorchScript to work because the output of the model must contain only tensors.)
The script should create a `test_data_accuracy.csv` file that summarizes the results.
294
325
295
326
#### On Custom Data
296
327
297
-
1. Modify the `audio_ground_truth_labels.csv` file to include the name of the audio file and expected audio label (like, `en` for English).
328
+
To run inference on custom data, you must specify a folder with **.wav** files and pass the path in as an argument. You can do so by creating a folder named `data_custom` and then copy 1 or 2 **.wav** files from your test dataset into it. **.mp3** files will NOT work.
298
329
299
-
By default, this is disabled. If required, use the `--ground_truth_compare` input option. To run inference on custom data, you must specify a folder with **.wav** files and pass the path in as an argument.
300
-
301
-
2. Run the inference_ script.
302
-
```
303
-
python inference_custom.py -p <data path>
304
-
```
330
+
Run the inference_ script.
331
+
```
332
+
python inference_custom.py -p <path_to_folder>
333
+
```
305
334
306
335
The following examples describe how to use the scripts to produce specific outcomes.
307
336
@@ -345,6 +374,10 @@ The following examples describe how to use the scripts to produce specific outco
345
374
prediction = self.model_int8(signal)
346
375
```
347
376
377
+
**(Optional) Comparing Predictions with Ground Truth**
378
+
379
+
You can choose to modify `audio_ground_truth_labels.csv` to include the name of the audio file and expected audio label (like, `en` for English), then run `inference_custom.py` with the `--ground_truth_compare` option. By default, this is disabled.
380
+
348
381
### Troubleshooting
349
382
350
383
If the model appears to be giving the same output regardless of input, try running `clean.sh` to remove the `RIR_NOISES` and `speechbrain` folders. Redownload that data after cleaning by running `initialize.sh` and either `inference_commonVoice.py` or `inference_custom.py`.
0 commit comments