Skip to content

Commit 987406c

Browse files
committed
Modernize LangID sample, fixes for training
1 parent 5be6adc commit 987406c

File tree

13 files changed

+190
-168
lines changed

13 files changed

+190
-168
lines changed
Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,5 @@
11
#!/bin/bash
22

3-
rm -R RIRS_NOISES
4-
rm -R tmp
5-
rm -R speechbrain
6-
rm -f rirs_noises.zip noise.csv reverb.csv vad_file.txt
3+
echo "Deleting .wav files, tmp"
74
rm -f ./*.wav
5+
rm -R tmp

AI-and-Analytics/End-to-end-Workloads/LanguageIdentification/Inference/inference_commonVoice.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ def __init__(self, dirpath, filename):
2929
self.sampleRate = 0
3030
self.waveData = ''
3131
self.wavesize = 0
32-
self.waveduriation = 0
32+
self.waveduration = 0
3333
if filename.endswith(".wav") or filename.endswith(".wmv"):
3434
self.wavefile = filename
3535
self.wavepath = dirpath + os.sep + filename

AI-and-Analytics/End-to-end-Workloads/LanguageIdentification/Inference/inference_custom.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ def __init__(self, dirpath, filename):
3030
self.sampleRate = 0
3131
self.waveData = ''
3232
self.wavesize = 0
33-
self.waveduriation = 0
33+
self.waveduration = 0
3434
if filename.endswith(".wav") or filename.endswith(".wmv"):
3535
self.wavefile = filename
3636
self.wavepath = dirpath + os.sep + filename
@@ -357,6 +357,8 @@ def main(argv):
357357
else:
358358
print("It is a special file (socket, FIFO, device file)" , path)
359359

360+
print("Done.\n")
361+
360362
if __name__ == "__main__":
361363
import sys
362364
sys.exit(main(sys.argv))

AI-and-Analytics/End-to-end-Workloads/LanguageIdentification/Inference/initialize.sh

Lines changed: 0 additions & 23 deletions
This file was deleted.

AI-and-Analytics/End-to-end-Workloads/LanguageIdentification/README.md

Lines changed: 87 additions & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ For both training and inference, you can run the sample and scripts in Jupyter N
3939

4040
## Prepare the Environment
4141

42-
### Downloading the CommonVoice Dataset
42+
### Download the CommonVoice Dataset
4343

4444
>**Note**: You can skip downloading the dataset if you already have a pretrained model and only want to run inference on custom data samples that you provide.
4545
@@ -79,33 +79,49 @@ tar -xf cv-corpus-11.0-2022-09-21-sv-SE.tar.gz
7979
mv cv-corpus-11.0-2022-09-21 swedish
8080
```
8181

82-
### Configuring the Container
82+
### Create and Set Up Environment
8383

84-
1. Pull the `oneapi-aikit` docker image.
85-
2. Set up the Docker environment.
86-
```
87-
docker pull intel/oneapi-aikit
88-
./launch_docker.sh
89-
```
90-
>**Note**: By default, the `Inference` and `Training` directories will be mounted and the environment variable `COMMON_VOICE_PATH` will be set to `/data/commonVoice` and mounted to `/data`. `COMMON_VOICE_PATH` is the location of where the CommonVoice dataset is downloaded.
84+
1. Create your conda environment by following the instructions on the Intel [AI Tools Selector](https://www.intel.com/content/www/us/en/developer/tools/oneapi/ai-tools-selector.html). You can follow these settings:
85+
86+
* AI Tools
87+
* Preset: Inference Optimization
88+
* Distribution Type: conda*
89+
* Python Versions: Python* 3.9 or 3.10
90+
91+
Then activate your environment:
92+
```bash
93+
conda activate <your-env-name>
94+
```
9195

96+
2. Set the environment variable `COMMON_VOICE_PATH`
97+
```bash
98+
export COMMON_VOICE_PATH=/data/commonVoice
99+
```
100+
101+
3. Install packages needed for MP3 to WAV conversion
102+
```bash
103+
sudo apt-get update && apt-get install ffmpeg libgl1
104+
```
105+
106+
4. Navigate to your working directory, clone the `oneapi-src` repository, and navigate to this code sample.
107+
```bash
108+
git clone https://github.com/oneapi-src/oneAPI-samples.git
109+
cd oneAPI-samples/AI-and-Analytics/End-to-end-Workloads/LanguageIdentification
110+
```
92111

112+
5. Run the bash script to install additional necessary libraries, including SpeechBrain.
113+
```bash
114+
source initialize.sh
115+
```
93116

94117
## Train the Model with Languages
95118

96119
This section explains how to train a model for language identification using the CommonVoice dataset, so it includes steps on how to preprocess the data, train the model, and prepare the output files for inference.
97120

98-
### Configure the Training Environment
99-
100-
1. Change to the `Training` directory.
101-
```
102-
cd /Training
103-
```
104-
2. Source the bash script to install the necessary components.
105-
```
106-
source initialize.sh
107-
```
108-
This installs PyTorch*, the Intel® Extension for PyTorch (IPEX), and other components.
121+
First, change to the `Training` directory.
122+
```
123+
cd /Training
124+
```
109125

110126
### Run in Jupyter Notebook
111127

@@ -139,7 +155,7 @@ If you cannot or do not want to use Jupyter Notebook, use these procedures to ru
139155
```
140156

141157
2. From the `Training` directory, apply patches to modify these files to work with the CommonVoice dataset.
142-
```
158+
```bash
143159
patch < create_wds_shards.patch
144160
patch < train_ecapa.patch
145161
```
@@ -154,8 +170,8 @@ The `prepareAllCommonVoice.py` script performs the following data preprocessing
154170
1. If you want to add additional languages, then modify the `LANGUAGE_PATHS` list in the file to reflect the languages to be included in the model.
155171

156172
2. Run the script with options. The samples will be divided as follows: 80% training, 10% validation, 10% testing.
157-
```
158-
python prepareAllCommonVoice.py -path /data -max_samples 2000 --createCsv --train --dev --test
173+
```bash
174+
python prepareAllCommonVoice.py -path $COMMON_VOICE_PATH -max_samples 2000 --createCsv --train --dev --test
159175
```
160176
| Parameters | Description
161177
|:--- |:---
@@ -166,24 +182,25 @@ The `prepareAllCommonVoice.py` script performs the following data preprocessing
166182
167183
#### Create Shards for Training and Validation
168184

169-
1. If the `/data/commonVoice_shards` folder exists, delete the folder and the contents before proceeding.
185+
1. If the `${COMMON_VOICE_PATH}/processed_data/commonVoice_shards` folder exists, delete the folder and the contents before proceeding.
170186
2. Enter the following commands.
187+
```bash
188+
python create_wds_shards.py ${COMMON_VOICE_PATH}/processed_data/train ${COMMON_VOICE_PATH}/processed_data/commonVoice_shards/train
189+
python create_wds_shards.py ${COMMON_VOICE_PATH}/processed_data/dev ${COMMON_VOICE_PATH}/processed_data/commonVoice_shards/dev
171190
```
172-
python create_wds_shards.py /data/commonVoice/train/ /data/commonVoice_shards/train
173-
python create_wds_shards.py /data/commonVoice/dev/ /data/commonVoice_shards/dev
174-
```
175-
3. Note the shard with the largest number as `LARGEST_SHARD_NUMBER` in the output above or by navigating to `/data/commonVoice_shards/train`.
191+
3. Note the shard with the largest number as `LARGEST_SHARD_NUMBER` in the output above or by navigating to `${COMMON_VOICE_PATH}/processed_data/commonVoice_shards/train`.
176192
4. Open the `train_ecapa.yaml` file and modify the `train_shards` variable to make the range reflect: `000000..LARGEST_SHARD_NUMBER`.
177-
5. Repeat the process for `/data/commonVoice_shards/dev`.
193+
5. Repeat Steps 3 and 4 for `${COMMON_VOICE_PATH}/processed_data/commonVoice_shards/dev`.
178194

179195
#### Run the Training Script
180196

181-
The YAML file `train_ecapa.yaml` with the training configurations should already be patched from the Prerequisite section.
197+
The YAML file `train_ecapa.yaml` with the training configurations is passed as an argument to the `train.py` script to train the model.
182198

183199
1. If necessary, edit the `train_ecapa.yaml` file to meet your needs.
184200

185201
| Parameters | Description
186202
|:--- |:---
203+
| `seed` | The seed value, which should be set to a different value for subsequent runs. Defaults to 1987.
187204
| `out_n_neurons` | Must be equal to the number of languages of interest.
188205
| `number_of_epochs` | Default is **10**. Adjust as needed.
189206
| `batch_size` | In the trainloader_options, decrease this value if your CPU or GPU runs out of memory while running the training script.
@@ -195,30 +212,48 @@ The YAML file `train_ecapa.yaml` with the training configurations should already
195212

196213
#### Move Model to Inference Folder
197214

198-
After training, the output should be inside `results/epaca/SEED_VALUE` folder. By default SEED_VALUE is set to 1987 in the YAML file. You can change the value as needed.
199-
200-
1. Copy all files with *cp -R* from `results/epaca/SEED_VALUE` into a new folder called `lang_id_commonvoice_model` in the **Inference** folder.
215+
After training, the output should be inside the `results/epaca/1987` folder. By default the `seed` is set to 1987 in `train_ecapa.yaml`. You can change the value as needed.
201216

202-
The name of the folder MUST match with the pretrained_path variable defined in the YAML file. By default, it is `lang_id_commonvoice_model`.
217+
1. Copy all files from `results/epaca/1987` into a new folder called `lang_id_commonvoice_model` in the **Inference** folder.
218+
```bash
219+
cp -R results/epaca/1987 ../Inference/lang_id_commonvoice_model
220+
```
221+
The name of the folder MUST match with the pretrained_path variable defined in `train_ecapa.yaml`. By default, it is `lang_id_commonvoice_model`.
203222

204223
2. Change directory to `/Inference/lang_id_commonvoice_model/save`.
224+
```bash
225+
cd ../Inference/lang_id_commonvoice_model/save
226+
```
227+
205228
3. Copy the `label_encoder.txt` file up one level.
206-
4. Change to the latest `CKPT` folder, and copy the classifier.ckpt and embedding_model.ckpt files into the `/Inference/lang_id_commonvoice_model/` folder.
229+
```bash
230+
cp label_encoder.txt ../.
231+
```
232+
233+
4. Change to the latest `CKPT` folder, and copy the classifier.ckpt and embedding_model.ckpt files into the `/Inference/lang_id_commonvoice_model/` folder which is two directories up.
234+
```bash
235+
# Navigate into the CKPT folder
236+
cd CKPT<DATE_OF_RUN>
207237

208-
You may need to modify the permissions of these files to be executable before you run the inference scripts to consume them.
238+
cp classifier.ckpt ../../.
239+
cp embedding_model.ckpt ../../
240+
cd ../..
241+
```
242+
243+
You may need to modify the permissions of these files to be executable i.e. `sudo chmod 755` before you run the inference scripts to consume them.
209244

210245
>**Note**: If `train.py` is rerun with the same seed, it will resume from the epoch number it last run. For a clean rerun, delete the `results` folder or change the seed.
211246
212247
You can now load the model for inference. In the `Inference` folder, the `inference_commonVoice.py` script uses the trained model on the testing dataset, whereas `inference_custom.py` uses the trained model on a user-specified dataset and can utilize Voice Activity Detection.
213248

214-
>**Note**: If the folder name containing the model is changed from `lang_id_commonvoice_model`, you will need to modify the `source_model_path` variable in `inference_commonVoice.py` and `inference_custom.py` files in the `speechbrain_inference` class.
249+
>**Note**: If the folder name containing the model is changed from `lang_id_commonvoice_model`, you will need to modify the `pretrained_path` in `train_ecapa.yaml`, and the `source_model_path` variable in both the `inference_commonVoice.py` and `inference_custom.py` files in the `speechbrain_inference` class.
215250
216251

217252
## Run Inference for Language Identification
218253

219254
>**Stop**: If you have not already done so, you must run the scripts in the `Training` folder to generate the trained model before proceeding.
220255
221-
To run inference, you must have already run all of the training scripts, generated the trained model, and moved files to the appropriate locations. You must place the model output in a folder name matching the name specified as the `pretrained_path` variable defined in the YAML file.
256+
To run inference, you must have already run all of the training scripts, generated the trained model, and moved files to the appropriate locations. You must place the model output in a folder name matching the name specified as the `pretrained_path` variable defined in `train_ecapa.yaml`.
222257

223258
>**Note**: If you plan to run inference on **custom data**, you will need to create a folder for the **.wav** files to be used for prediction. For example, `data_custom`. Move the **.wav** files to your custom folder. (For quick results, you may select a few audio files from each language downloaded from CommonVoice.)
224259
@@ -228,13 +263,9 @@ To run inference, you must have already run all of the training scripts, generat
228263
```
229264
cd /Inference
230265
```
231-
2. Source the bash script to install or update the necessary components.
232-
```
233-
source initialize.sh
266+
2. Patch SpeechBrain's `interfaces.py`. This patch is required for PyTorch* TorchScript to work because the output of the model must contain only tensors.
234267
```
235-
3. Patch the Intel® Extension for PyTorch (IPEX) to use SpeechBrain models. (This patch is required for PyTorch* TorchScript to work because the output of the model must contain only tensors.)
236-
```
237-
patch ./speechbrain/speechbrain/pretrained/interfaces.py < interfaces.patch
268+
patch ../speechbrain/speechbrain/pretrained/interfaces.py < interfaces.patch
238269
```
239270

240271
### Run in Jupyter Notebook
@@ -245,7 +276,7 @@ To run inference, you must have already run all of the training scripts, generat
245276
```
246277
2. Launch Jupyter Notebook.
247278
```
248-
jupyter notebook --ip 0.0.0.0 --port 8888 --allow-root
279+
jupyter notebook --ip 0.0.0.0 --port 8889 --allow-root
249280
```
250281
3. Follow the instructions to open the URL with the token in your browser.
251282
4. Locate and select the inference Notebook.
@@ -287,21 +318,19 @@ Both scripts support input options; however, some options can be use on `inferen
287318
#### On the CommonVoice Dataset
288319

289320
1. Run the inference_commonvoice.py script.
290-
```
291-
python inference_commonVoice.py -p /data/commonVoice/test
321+
```bash
322+
python inference_commonVoice.py -p ${COMMON_VOICE_PATH}/processed_data/test
292323
```
293324
The script should create a `test_data_accuracy.csv` file that summarizes the results.
294325

295326
#### On Custom Data
296327

297-
1. Modify the `audio_ground_truth_labels.csv` file to include the name of the audio file and expected audio label (like, `en` for English).
328+
To run inference on custom data, you must specify a folder with **.wav** files and pass the path in as an argument. You can do so by creating a folder named `data_custom` and then copy 1 or 2 **.wav** files from your test dataset into it. **.mp3** files will NOT work.
298329

299-
By default, this is disabled. If required, use the `--ground_truth_compare` input option. To run inference on custom data, you must specify a folder with **.wav** files and pass the path in as an argument.
300-
301-
2. Run the inference_ script.
302-
```
303-
python inference_custom.py -p <data path>
304-
```
330+
Run the inference_ script.
331+
```
332+
python inference_custom.py -p <path_to_folder>
333+
```
305334

306335
The following examples describe how to use the scripts to produce specific outcomes.
307336

@@ -345,6 +374,10 @@ The following examples describe how to use the scripts to produce specific outco
345374
prediction = self.model_int8(signal)
346375
```
347376

377+
**(Optional) Comparing Predictions with Ground Truth**
378+
379+
You can choose to modify `audio_ground_truth_labels.csv` to include the name of the audio file and expected audio label (like, `en` for English), then run `inference_custom.py` with the `--ground_truth_compare` option. By default, this is disabled.
380+
348381
### Troubleshooting
349382

350383
If the model appears to be giving the same output regardless of input, try running `clean.sh` to remove the `RIR_NOISES` and `speechbrain` folders. Redownload that data after cleaning by running `initialize.sh` and either `inference_commonVoice.py` or `inference_custom.py`.
Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,4 @@
11
#!/bin/bash
22

3-
rm -R RIRS_NOISES
4-
rm -R speechbrain
5-
rm -f rirs_noises.zip noise.csv reverb.csv
3+
echo "Deleting rir, noise, speechbrain"
4+
rm -R rir noise speechbrain

AI-and-Analytics/End-to-end-Workloads/LanguageIdentification/Training/create_wds_shards.patch

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
1-
--- create_wds_shards.py 2022-09-20 14:55:48.732386718 -0700
2-
+++ create_wds_shards_commonvoice.py 2022-09-20 14:53:56.554637629 -0700
1+
--- create_wds_shards.py 2024-11-13 18:08:07.440000000 -0800
2+
+++ create_wds_shards_modified.py 2024-11-14 14:09:36.225000000 -0800
33
@@ -27,7 +27,10 @@
44
t, sr = torchaudio.load(audio_file_path)
55

@@ -12,7 +12,7 @@
1212

1313
return t
1414

15-
@@ -61,27 +64,20 @@
15+
@@ -66,27 +69,22 @@
1616
sample_keys_per_language = defaultdict(list)
1717

1818
for f in audio_files:
@@ -23,7 +23,9 @@
2323
- f.as_posix(),
2424
- )
2525
+ # Common Voice format
26-
+ # commonVoice_folder_path/common_voice_<LANG_ID>_00000000.wav'
26+
+ # commonVoice_folder_path/processed_data/<DATASET_TYPE>/common_voice_<LANG_ID>_00000000.wav'
27+
+ # DATASET_TYPE: dev, test, train
28+
+ # LANG_ID: the label for the language
2729
+ m = re.match(r"((.*)(common_voice_)(.+)(_)(\d+).wav)", f.as_posix())
2830
+
2931
if m:

AI-and-Analytics/End-to-end-Workloads/LanguageIdentification/Training/initialize.sh

Lines changed: 0 additions & 26 deletions
This file was deleted.

0 commit comments

Comments
 (0)