Skip to content

Commit 921f534

Browse files
committed
update notebooks and sample.json
1 parent 2ec3444 commit 921f534

File tree

4 files changed

+79
-17
lines changed

4 files changed

+79
-17
lines changed

AI-and-Analytics/End-to-end-Workloads/LanguageIdentification/Inference/lang_id_inference.ipynb

Lines changed: 39 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -47,15 +47,15 @@
4747
"metadata": {},
4848
"outputs": [],
4949
"source": [
50-
"!python inference_commonVoice.py -p /data/commonVoice/test"
50+
"!python inference_commonVoice.py -p ${COMMON_VOICE_PATH}/processed_data/test"
5151
]
5252
},
5353
{
5454
"cell_type": "markdown",
5555
"metadata": {},
5656
"source": [
5757
"## inference_custom.py for Custom Data \n",
58-
"To generate an overall results output summary, the audio_ground_truth_labels.csv file needs to be modified with the name of the audio file and expected audio label (i.e. en for English). By default, this is disabled but if desired, the *--ground_truth_compare* can be used. To run inference on custom data, you must specify a folder with WAV files and pass the path in as an argument. "
58+
"To run inference on custom data, you must specify a folder with .wav files and pass the path in as an argument. You can do so by creating a folder named `data_custom` and then copy 1 or 2 .wav files from your test dataset into it. .mp3 files will NOT work. "
5959
]
6060
},
6161
{
@@ -65,7 +65,7 @@
6565
"### Randomly select audio clips from audio files for prediction\n",
6666
"python inference_custom.py -p DATAPATH -d DURATION -s SIZE\n",
6767
"\n",
68-
"An output file output_summary.csv will give the summary of the results."
68+
"An output file `output_summary.csv` will give the summary of the results."
6969
]
7070
},
7171
{
@@ -104,6 +104,8 @@
104104
"### Optimizations with Intel® Extension for PyTorch (IPEX) \n",
105105
"python inference_custom.py -p data_custom -d 3 -s 50 --vad --ipex --verbose \n",
106106
"\n",
107+
"This will apply ipex.optimize to the model(s) and TorchScript. You can also add the --bf16 option along with --ipex to run in the BF16 data type, supported on 4th Gen Intel® Xeon® Scalable processors and newer.\n",
108+
"\n",
107109
"Note that the *--verbose* option is required to view the latency measurements. "
108110
]
109111
},
@@ -121,7 +123,7 @@
121123
"metadata": {},
122124
"source": [
123125
"## Quantization with Intel® Neural Compressor (INC)\n",
124-
"To improve inference latency, Intel® Neural Compressor (INC) can be used to quantize the trained model from FP32 to INT8 by running quantize_model.py. The *-datapath* argument can be used to specify a custom evaluation dataset but by default it is set to */data/commonVoice/dev* which was generated from the data preprocessing scripts in the *Training* folder. "
126+
"To improve inference latency, Intel® Neural Compressor (INC) can be used to quantize the trained model from FP32 to INT8 by running quantize_model.py. The *-datapath* argument can be used to specify a custom evaluation dataset but by default it is set to `$COMMON_VOICE_PATH/processed_data/dev` which was generated from the data preprocessing scripts in the `Training` folder. "
125127
]
126128
},
127129
{
@@ -137,7 +139,39 @@
137139
"cell_type": "markdown",
138140
"metadata": {},
139141
"source": [
140-
"After quantization, the model will be stored in *lang_id_commonvoice_model_INT8* and *neural_compressor.utils.pytorch.load* will have to be used to load the quantized model for inference. "
142+
"After quantization, the model will be stored in lang_id_commonvoice_model_INT8 and neural_compressor.utils.pytorch.load will have to be used to load the quantized model for inference. If self.language_id is the original model and data_path is the path to the audio file:\n",
143+
"\n",
144+
"```\n",
145+
"from neural_compressor.utils.pytorch import load\n",
146+
"model_int8 = load(\"./lang_id_commonvoice_model_INT8\", self.language_id)\n",
147+
"signal = self.language_id.load_audio(data_path)\n",
148+
"prediction = self.model_int8(signal)\n",
149+
"```"
150+
]
151+
},
152+
{
153+
"cell_type": "markdown",
154+
"metadata": {},
155+
"source": [
156+
"The code above is integrated into inference_custom.py. You can now run inference on your data using this INT8 model:"
157+
]
158+
},
159+
{
160+
"cell_type": "code",
161+
"execution_count": null,
162+
"metadata": {},
163+
"outputs": [],
164+
"source": [
165+
"!python inference_custom.py -p data_custom -d 3 -s 50 --vad --int8_model --verbose"
166+
]
167+
},
168+
{
169+
"cell_type": "markdown",
170+
"metadata": {},
171+
"source": [
172+
"### (Optional) Comparing Predictions with Ground Truth\n",
173+
"\n",
174+
"You can choose to modify audio_ground_truth_labels.csv to include the name of the audio file and expected audio label (like, en for English), then run inference_custom.py with the --ground_truth_compare option. By default, this is disabled."
141175
]
142176
},
143177
{

AI-and-Analytics/End-to-end-Workloads/LanguageIdentification/README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -123,7 +123,7 @@ First, change to the `Training` directory.
123123
cd /Training
124124
```
125125

126-
### Run in Jupyter Notebook
126+
### Option 1: Run in Jupyter Notebook
127127

128128
1. Install Jupyter Notebook.
129129
```
@@ -141,7 +141,7 @@ cd /Training
141141
5. Follow the instructions in the Notebook.
142142

143143

144-
### Run in a Console
144+
### Option 2: Run in a Console
145145

146146
If you cannot or do not want to use Jupyter Notebook, use these procedures to run the sample and scripts locally.
147147

@@ -264,7 +264,7 @@ To run inference, you must have already run all of the training scripts, generat
264264
cd /Inference
265265
```
266266

267-
### Run in Jupyter Notebook
267+
### Option 1: Run in Jupyter Notebook
268268

269269
1. If you have not already done so, install Jupyter Notebook.
270270
```
@@ -281,7 +281,7 @@ To run inference, you must have already run all of the training scripts, generat
281281
```
282282
5. Follow the instructions in the Notebook.
283283

284-
### Run in a Console
284+
### Option 2: Run in a Console
285285

286286
If you cannot or do not want to use Jupyter Notebook, use these procedures to run the sample and scripts locally.
287287

AI-and-Analytics/End-to-end-Workloads/LanguageIdentification/Training/lang_id_training.ipynb

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -110,7 +110,7 @@
110110
"cell_type": "markdown",
111111
"metadata": {},
112112
"source": [
113-
"Note down the shard with the largest number as LARGEST_SHARD_NUMBER in the output above or by navigating to *${COMMON_VOICE_PATH}/processed_data/commonVoice_shards/train*. In *train_ecapa.yaml*, modify the *train_shards* variable to go from 000000..LARGEST_SHARD_NUMBER. Repeat the process for *${COMMON_VOICE_PATH}/processed_data/commonVoice_shards/dev*. "
113+
"Note down the shard with the largest number as LARGEST_SHARD_NUMBER in the output above or by navigating to `${COMMON_VOICE_PATH}/processed_data/commonVoice_shards/train`. In `train_ecapa.yaml`, modify the `train_shards` variable to go from 000000..LARGEST_SHARD_NUMBER. Repeat the process for `${COMMON_VOICE_PATH}/processed_data/commonVoice_shards/dev`. "
114114
]
115115
},
116116
{
@@ -167,20 +167,20 @@
167167
"outputs": [],
168168
"source": [
169169
"# 1)\n",
170-
"cp -R results/epaca/1987 ../Inference/lang_id_commonvoice_model\n",
170+
"!cp -R results/epaca/1987 ../Inference/lang_id_commonvoice_model\n",
171171
"\n",
172172
"# 2)\n",
173-
"cd ../Inference/lang_id_commonvoice_model/save\n",
173+
"!cd ../Inference/lang_id_commonvoice_model/save\n",
174174
"\n",
175175
"# 3)\n",
176-
"cp label_encoder.txt ../.\n",
176+
"!cp label_encoder.txt ../.\n",
177177
"\n",
178178
"# 4)\n",
179179
"# Navigate into the CKPT folder\n",
180-
"cd CKPT<DATE_OF_RUN> #@TODO: set this to your CKPT folder\n",
181-
"cp classifier.ckpt ../../.\n",
182-
"cp embedding_model.ckpt ../../\n",
183-
"cd ../.."
180+
"!cd CKPT<DATE_OF_RUN> #@TODO: set this to your CKPT folder\n",
181+
"!cp classifier.ckpt ../../.\n",
182+
"!cp embedding_model.ckpt ../../\n",
183+
"!cd ../.."
184184
]
185185
},
186186
{

AI-and-Analytics/End-to-end-Workloads/LanguageIdentification/sample.json

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,34 @@
1414
"env": [
1515
],
1616
"steps": [
17+
"export COMMON_VOICE_PATH=/data/commonVoice",
18+
"sudo apt-get update && apt-get install ffmpeg libgl1",
19+
"git clone https://github.com/oneapi-src/oneAPI-samples.git",
20+
"cd oneAPI-samples/AI-and-Analytics/End-to-end-Workloads/LanguageIdentification",
21+
"source initialize.sh",
22+
"cd /Training",
23+
"cp speechbrain/recipes/VoxLingua107/lang_id/create_wds_shards.py create_wds_shards.py",
24+
"cp speechbrain/recipes/VoxLingua107/lang_id/train.py train.py",
25+
"cp speechbrain/recipes/VoxLingua107/lang_id/hparams/train_ecapa.yaml train_ecapa.yaml",
26+
"patch < create_wds_shards.patch",
27+
"patch < train_ecapa.patch",
28+
"python prepareAllCommonVoice.py -path $COMMON_VOICE_PATH -max_samples 2000 --createCsv --train --dev --test",
29+
"python create_wds_shards.py ${COMMON_VOICE_PATH}/processed_data/train ${COMMON_VOICE_PATH}/processed_data/commonVoice_shards/train",
30+
"python create_wds_shards.py ${COMMON_VOICE_PATH}/processed_data/dev ${COMMON_VOICE_PATH}/processed_data/commonVoice_shards/dev",
31+
"python train.py train_ecapa.yaml --device cpu",
32+
"cp -R results/epaca/1987 ../Inference/lang_id_commonvoice_model",
33+
"cd ../Inference/lang_id_commonvoice_model/save",
34+
"cp label_encoder.txt ../.",
35+
"cd CKPT<DATE_OF_RUN>",
36+
"cp classifier.ckpt ../../.",
37+
"cp embedding_model.ckpt ../../",
38+
"cd ../..",
39+
"cd /Inference",
40+
"python inference_commonVoice.py -p ${COMMON_VOICE_PATH}/processed_data/test",
41+
"python inference_custom.py -p data_custom -d 3 -s 50 --vad",
42+
"python inference_custom.py -p data_custom -d 3 -s 50 --vad --ipex --verbose",
43+
"python quantize_model.py -p ./lang_id_commonvoice_model -datapath $COMMON_VOICE_PATH/processed_data/dev",
44+
"python inference_custom.py -p data_custom -d 3 -s 50 --vad --int8_model --verbose"
1745
]
1846
}
1947
]

0 commit comments

Comments
 (0)