oneapi-src
diff --git a/‎AI-and-Analytics/End-to-end-Workloads/LanguageIdentification/Inference/clean.sh
Lines changed: 2 additions & 4 deletions b/‎AI-and-Analytics/End-to-end-Workloads/LanguageIdentification/Inference/clean.sh
Lines changed: 2 additions & 4 deletions
diff --git a/‎AI-and-Analytics/End-to-end-Workloads/LanguageIdentification/Inference/inference_commonVoice.py
Lines changed: 1 addition & 1 deletion b/‎AI-and-Analytics/End-to-end-Workloads/LanguageIdentification/Inference/inference_commonVoice.py
Lines changed: 1 addition & 1 deletion
diff --git a/‎AI-and-Analytics/End-to-end-Workloads/LanguageIdentification/Inference/inference_custom.py
Lines changed: 3 additions & 1 deletion b/‎AI-and-Analytics/End-to-end-Workloads/LanguageIdentification/Inference/inference_custom.py
Lines changed: 3 additions & 1 deletion
diff --git a/‎AI-and-Analytics/End-to-end-Workloads/LanguageIdentification/Inference/initialize.sh
Lines changed: 0 additions & 23 deletions b/‎AI-and-Analytics/End-to-end-Workloads/LanguageIdentification/Inference/initialize.sh
Lines changed: 0 additions & 23 deletions
diff --git a/‎AI-and-Analytics/End-to-end-Workloads/LanguageIdentification/README.md
Lines changed: 87 additions & 54 deletions b/‎AI-and-Analytics/End-to-end-Workloads/LanguageIdentification/README.md
Lines changed: 87 additions & 54 deletions
diff --git a/‎AI-and-Analytics/End-to-end-Workloads/LanguageIdentification/Training/clean.sh
Lines changed: 2 additions & 3 deletions b/‎AI-and-Analytics/End-to-end-Workloads/LanguageIdentification/Training/clean.sh
Lines changed: 2 additions & 3 deletions
diff --git a/‎AI-and-Analytics/End-to-end-Workloads/LanguageIdentification/Training/create_wds_shards.patch
Lines changed: 6 additions & 4 deletions b/‎AI-and-Analytics/End-to-end-Workloads/LanguageIdentification/Training/create_wds_shards.patch
Lines changed: 6 additions & 4 deletions
diff --git a/‎AI-and-Analytics/End-to-end-Workloads/LanguageIdentification/Training/initialize.sh
Lines changed: 0 additions & 26 deletions b/‎AI-and-Analytics/End-to-end-Workloads/LanguageIdentification/Training/initialize.sh
Lines changed: 0 additions & 26 deletions
@@ -1,7 +1,5 @@
 #!/bin/bash
 
-rm -R RIRS_NOISES
-rm -R tmp
-rm -R speechbrain
-rm -f rirs_noises.zip noise.csv reverb.csv vad_file.txt
+echo "Deleting .wav files, tmp"
 rm -f ./*.wav
+rm -R tmp
@@ -29,7 +29,7 @@ def __init__(self, dirpath, filename):
         self.sampleRate = 0
         self.waveData = ''
         self.wavesize = 0
-        self.waveduriation = 0
+        self.waveduration = 0
         if filename.endswith(".wav") or filename.endswith(".wmv"):
             self.wavefile = filename
             self.wavepath = dirpath + os.sep + filename
 
@@ -30,7 +30,7 @@ def __init__(self, dirpath, filename):
         self.sampleRate = 0
         self.waveData = ''
         self.wavesize = 0
-        self.waveduriation = 0
+        self.waveduration = 0
         if filename.endswith(".wav") or filename.endswith(".wmv"):
             self.wavefile = filename
             self.wavepath = dirpath + os.sep + filename
@@ -357,6 +357,8 @@ def main(argv):
     else:  
         print("It is a special file (socket, FIFO, device file)" , path)
 
+    print("Done.\n")
+
 if __name__ == "__main__":
     import sys
     sys.exit(main(sys.argv))
@@ -39,7 +39,7 @@ For both training and inference, you can run the sample and scripts in Jupyter N
 
 ## Prepare the Environment
 
-### Downloading the CommonVoice Dataset
+### Download the CommonVoice Dataset
 
 >**Note**: You can skip downloading the dataset if you already have a pretrained model and only want to run inference on custom data samples that you provide.
 
@@ -79,33 +79,49 @@ tar -xf cv-corpus-11.0-2022-09-21-sv-SE.tar.gz
 mv cv-corpus-11.0-2022-09-21 swedish
 ```
 
-### Configuring the Container
+### Create and Set Up Environment
 
-1. Pull the `oneapi-aikit` docker image.
-2. Set up the Docker environment.
-   ```
-   docker pull intel/oneapi-aikit
-   ./launch_docker.sh
-   ```
-   >**Note**: By default, the `Inference` and `Training` directories will be mounted and the environment variable `COMMON_VOICE_PATH` will be set to `/data/commonVoice` and mounted to `/data`. `COMMON_VOICE_PATH` is the location of where the CommonVoice dataset is downloaded.
+1. Create your conda environment by following the instructions on the Intel [AI Tools Selector](https://www.intel.com/content/www/us/en/developer/tools/oneapi/ai-tools-selector.html). You can follow these settings:
+
+* AI Tools
+* Preset: Inference Optimization
+* Distribution Type: conda*
+* Python Versions: Python* 3.9 or 3.10
+
+Then activate your environment:
+```bash
+conda activate <your-env-name>
+```
 
+2. Set the environment variable `COMMON_VOICE_PATH`
+```bash
+export COMMON_VOICE_PATH=/data/commonVoice
+```
+
+3. Install packages needed for MP3 to WAV conversion
+```bash
+sudo apt-get update && apt-get install ffmpeg libgl1
+```
+
+4. Navigate to your working directory, clone the `oneapi-src` repository, and navigate to this code sample.
+```bash
+git clone https://github.com/oneapi-src/oneAPI-samples.git 
+cd oneAPI-samples/AI-and-Analytics/End-to-end-Workloads/LanguageIdentification 
+```
 
+5. Run the bash script to install additional necessary libraries, including SpeechBrain.
+```bash
+source initialize.sh
+```
 
 ## Train the Model with Languages
 
 This section explains how to train a model for language identification using the CommonVoice dataset, so it includes steps on how to preprocess the data, train the model, and prepare the output files for inference.
 
-### Configure the Training Environment
-
-1. Change to the `Training` directory.
-   ```
-   cd /Training
-   ```
-2. Source the bash script to install the necessary components.
-   ```
-   source initialize.sh
-   ```
-   This installs PyTorch*, the Intel® Extension for PyTorch (IPEX), and other components.
+First, change to the `Training` directory.
+```
+cd /Training
+```
 
 ### Run in Jupyter Notebook
 
@@ -139,7 +155,7 @@ If you cannot or do not want to use Jupyter Notebook, use these procedures to ru
    ```
 
 2. From the `Training` directory, apply patches to modify these files to work with the CommonVoice dataset.
-   ```
+   ```bash
    patch < create_wds_shards.patch
    patch < train_ecapa.patch
    ```
@@ -154,8 +170,8 @@ The `prepareAllCommonVoice.py` script performs the following data preprocessing
 1. If you want to add additional languages, then modify the `LANGUAGE_PATHS` list in the file to reflect the languages to be included in the model.
 
 2. Run the script with options. The samples will be divided as follows: 80% training, 10% validation, 10% testing.
-   ```
-   python prepareAllCommonVoice.py -path /data -max_samples 2000 --createCsv --train --dev --test
+   ```bash
+   python prepareAllCommonVoice.py -path $COMMON_VOICE_PATH -max_samples 2000 --createCsv --train --dev --test
    ```
    | Parameters      | Description
    |:---             |:---
@@ -166,24 +182,25 @@ The `prepareAllCommonVoice.py` script performs the following data preprocessing
 
 #### Create Shards for Training and Validation
 
-1. If the `/data/commonVoice_shards` folder exists, delete the folder and the contents before proceeding.
+1. If the `${COMMON_VOICE_PATH}/processed_data/commonVoice_shards` folder exists, delete the folder and the contents before proceeding.
 2. Enter the following commands.
+   ```bash
+   python create_wds_shards.py ${COMMON_VOICE_PATH}/processed_data/train ${COMMON_VOICE_PATH}/processed_data/commonVoice_shards/train
+   python create_wds_shards.py ${COMMON_VOICE_PATH}/processed_data/dev ${COMMON_VOICE_PATH}/processed_data/commonVoice_shards/dev
    ```
-   python create_wds_shards.py /data/commonVoice/train/ /data/commonVoice_shards/train
-   python create_wds_shards.py /data/commonVoice/dev/ /data/commonVoice_shards/dev
-   ```
-3. Note the shard with the largest number as `LARGEST_SHARD_NUMBER` in the output above or by navigating to `/data/commonVoice_shards/train`.
+3. Note the shard with the largest number as `LARGEST_SHARD_NUMBER` in the output above or by navigating to `${COMMON_VOICE_PATH}/processed_data/commonVoice_shards/train`.
 4. Open the `train_ecapa.yaml` file and modify the `train_shards` variable to make the range reflect: `000000..LARGEST_SHARD_NUMBER`.
-5. Repeat the process for `/data/commonVoice_shards/dev`.
+5. Repeat Steps 3 and 4 for `${COMMON_VOICE_PATH}/processed_data/commonVoice_shards/dev`.
 
 #### Run the Training Script
 
-The YAML file `train_ecapa.yaml` with the training configurations should already be patched from the Prerequisite section.
+The YAML file `train_ecapa.yaml` with the training configurations is passed as an argument to the `train.py` script to train the model.
 
 1. If necessary, edit the `train_ecapa.yaml` file to meet your needs.
 
    | Parameters          | Description
    |:---                 |:---
+   | `seed`              | The seed value, which should be set to a different value for subsequent runs. Defaults to 1987.
    | `out_n_neurons`     | Must be equal to the number of languages of interest.
    | `number_of_epochs`  | Default is **10**. Adjust as needed.
    | `batch_size`        | In the trainloader_options, decrease this value if your CPU or GPU runs out of memory while running the training script.
@@ -195,30 +212,48 @@ The YAML file `train_ecapa.yaml` with the training configurations should already
 
 #### Move Model to Inference Folder
 
-After training, the output should be inside `results/epaca/SEED_VALUE` folder. By default SEED_VALUE is set to 1987 in the YAML file. You can change the value as needed.
-
-1. Copy all files with *cp -R* from `results/epaca/SEED_VALUE` into a new folder called `lang_id_commonvoice_model` in the **Inference** folder.
+After training, the output should be inside the `results/epaca/1987` folder. By default the `seed` is set to 1987 in `train_ecapa.yaml`. You can change the value as needed.
 
-   The name of the folder MUST match with the pretrained_path variable defined in the YAML file. By default, it is `lang_id_commonvoice_model`.
+1. Copy all files from `results/epaca/1987` into a new folder called `lang_id_commonvoice_model` in the **Inference** folder.
+   ```bash
+   cp -R results/epaca/1987 ../Inference/lang_id_commonvoice_model
+   ```
+   The name of the folder MUST match with the pretrained_path variable defined in `train_ecapa.yaml`. By default, it is `lang_id_commonvoice_model`.
 
 2. Change directory to `/Inference/lang_id_commonvoice_model/save`.
+   ```bash
+   cd ../Inference/lang_id_commonvoice_model/save
+   ```
+
 3. Copy the `label_encoder.txt` file up one level.
-4. Change to the latest `CKPT` folder, and copy the classifier.ckpt and embedding_model.ckpt files into the `/Inference/lang_id_commonvoice_model/` folder.
+   ```bash
+   cp label_encoder.txt ../.
+   ```
+
+4. Change to the latest `CKPT` folder, and copy the classifier.ckpt and embedding_model.ckpt files into the `/Inference/lang_id_commonvoice_model/` folder which is two directories up.
+   ```bash
+   # Navigate into the CKPT folder
+   cd CKPT<DATE_OF_RUN>
 
-   You may need to modify the permissions of these files to be executable before you run the inference scripts to consume them.
+   cp classifier.ckpt ../../.
+   cp embedding_model.ckpt ../../
+   cd ../..
+   ```
+
+   You may need to modify the permissions of these files to be executable i.e. `sudo chmod 755` before you run the inference scripts to consume them.
 
 >**Note**: If `train.py` is rerun with the same seed, it will resume from the epoch number it last run. For a clean rerun, delete the `results` folder or change the seed.
 
 You can now load the model for inference. In the `Inference` folder, the `inference_commonVoice.py` script uses the trained model on the testing dataset, whereas `inference_custom.py` uses the trained model on a user-specified dataset and can utilize Voice Activity Detection. 
 
->**Note**: If the folder name containing the model is changed from `lang_id_commonvoice_model`, you will need to modify the `source_model_path` variable in `inference_commonVoice.py` and `inference_custom.py` files in the `speechbrain_inference` class.
+>**Note**: If the folder name containing the model is changed from `lang_id_commonvoice_model`, you will need to modify the `pretrained_path` in `train_ecapa.yaml`, and the `source_model_path` variable in both the `inference_commonVoice.py` and `inference_custom.py` files in the `speechbrain_inference` class. 
 
 
 ## Run Inference for Language Identification
 
 >**Stop**: If you have not already done so, you must run the scripts in the `Training` folder to generate the trained model before proceeding.
 
-To run inference, you must have already run all of the training scripts, generated the trained model, and moved files to the appropriate locations. You must place the model output in a folder name matching the name specified as the `pretrained_path` variable defined in the YAML file.
+To run inference, you must have already run all of the training scripts, generated the trained model, and moved files to the appropriate locations. You must place the model output in a folder name matching the name specified as the `pretrained_path` variable defined in `train_ecapa.yaml`.
 
 >**Note**: If you plan to run inference on **custom data**, you will need to create a folder for the **.wav** files to be used for prediction. For example, `data_custom`. Move the **.wav** files to your custom folder. (For quick results, you may select a few audio files from each language downloaded from CommonVoice.)
 
@@ -228,13 +263,9 @@ To run inference, you must have already run all of the training scripts, generat
    ```
    cd /Inference
    ```
-2. Source the bash script to install or update the necessary components.
-   ```
-   source initialize.sh
+2. Patch SpeechBrain's `interfaces.py`. This patch is required for PyTorch* TorchScript to work because the output of the model must contain only tensors.
    ```
-3. Patch the Intel® Extension for PyTorch (IPEX) to use SpeechBrain models. (This patch is required for PyTorch* TorchScript to work because the output of the model must contain only tensors.)
-   ```
-   patch ./speechbrain/speechbrain/pretrained/interfaces.py < interfaces.patch
+   patch ../speechbrain/speechbrain/pretrained/interfaces.py < interfaces.patch
    ```
 
 ### Run in Jupyter Notebook
@@ -245,7 +276,7 @@ To run inference, you must have already run all of the training scripts, generat
    ```
 2. Launch Jupyter Notebook.
    ```
-   jupyter notebook --ip 0.0.0.0 --port 8888 --allow-root
+   jupyter notebook --ip 0.0.0.0 --port 8889 --allow-root
    ```
 3. Follow the instructions to open the URL with the token in your browser.
 4. Locate and select the inference Notebook.
@@ -287,21 +318,19 @@ Both scripts support input options; however, some options can be use on `inferen
 #### On the CommonVoice Dataset
 
 1. Run the inference_commonvoice.py script.
-   ```
-   python inference_commonVoice.py -p /data/commonVoice/test
+   ```bash
+   python inference_commonVoice.py -p ${COMMON_VOICE_PATH}/processed_data/test
    ```
    The script should create a `test_data_accuracy.csv` file that summarizes the results.
 
 #### On Custom Data
 
-1. Modify the `audio_ground_truth_labels.csv` file to include the name of the audio file and expected audio label (like, `en` for English).
+To run inference on custom data, you must specify a folder with **.wav** files and pass the path in as an argument. You can do so by creating a folder named `data_custom` and then copy 1 or 2 **.wav** files from your test dataset into it. **.mp3** files will NOT work. 
 
-   By default, this is disabled. If required, use the `--ground_truth_compare` input option. To run inference on custom data, you must specify a folder with **.wav** files and pass the path in as an argument.
-
-2. Run the inference_ script.
-   ```
-   python inference_custom.py -p <data path>
-   ```
+Run the inference_ script.
+```
+python inference_custom.py -p <path_to_folder>
+```
 
 The following examples describe how to use the scripts to produce specific outcomes.
 
@@ -345,6 +374,10 @@ The following examples describe how to use the scripts to produce specific outco
    prediction = self.model_int8(signal)
    ```
 
+**(Optional) Comparing Predictions with Ground Truth**
+
+You can choose to modify `audio_ground_truth_labels.csv` to include the name of the audio file and expected audio label (like, `en` for English), then run `inference_custom.py` with the `--ground_truth_compare` option. By default, this is disabled.  
+
 ### Troubleshooting
 
 If the model appears to be giving the same output regardless of input, try running `clean.sh` to remove the `RIR_NOISES` and `speechbrain` folders. Redownload that data after cleaning by running `initialize.sh` and either `inference_commonVoice.py` or `inference_custom.py`.
 
@@ -1,5 +1,4 @@
 #!/bin/bash
 
-rm -R RIRS_NOISES
-rm -R speechbrain
-rm -f rirs_noises.zip noise.csv reverb.csv
+echo "Deleting rir, noise, speechbrain"
+rm -R rir noise speechbrain
@@ -1,5 +1,5 @@
---- create_wds_shards.py	2022-09-20 14:55:48.732386718 -0700
-+++ create_wds_shards_commonvoice.py	2022-09-20 14:53:56.554637629 -0700
+--- create_wds_shards.py	2024-11-13 18:08:07.440000000 -0800
++++ create_wds_shards_modified.py	2024-11-14 14:09:36.225000000 -0800
@@ -27,7 +27,10 @@
      t, sr = torchaudio.load(audio_file_path)
 
@@ -12,7 +12,7 @@
 
      return t
 
-@@ -61,27 +64,20 @@
+@@ -66,27 +69,22 @@
      sample_keys_per_language = defaultdict(list)
 
      for f in audio_files:
@@ -23,7 +23,9 @@
 -            f.as_posix(),
 -        )
 +        # Common Voice format
-+        # commonVoice_folder_path/common_voice_<LANG_ID>_00000000.wav'
++        # commonVoice_folder_path/processed_data/<DATASET_TYPE>/common_voice_<LANG_ID>_00000000.wav'
++        # DATASET_TYPE: dev, test, train
++        # LANG_ID: the label for the language
 +        m = re.match(r"((.*)(common_voice_)(.+)(_)(\d+).wav)", f.as_posix())
 +
          if m: