Skip to content

Commit f3cd285

Browse files
committed
speech section
1 parent 39a470a commit f3cd285

File tree

2 files changed

+49
-27
lines changed

2 files changed

+49
-27
lines changed
142 KB
Loading

content/hardware/04.pro/shields/portenta-vision-shield/tutorials/user-manual/content.md

Lines changed: 49 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -690,49 +690,71 @@ With this script running you will be able to see the Fast Fourier Transform resu
690690

691691
You can easily implement sound/voice recognition applications using Machine Learning on the edge, this means that the Portenta H7 plus the Vision Shield can run these algorithms locally.
692692

693-
For this example, we are going to test a [pre-trained model]((https://raw.githubusercontent.com/iabdalkader/microspeech-yesno-model/main/model.tflite)) that can recognize the `yes` and `no` keywords
693+
Use the following script to run the example. It can also be found on **File > Examples > Audio > micro_speech.py** in the OpenMV IDE.
694+
695+
```python
696+
import time
697+
from ml.apps import MicroSpeech
698+
699+
700+
def callback(label, scores):
701+
print(f'\nHeard: "{label}" @{time.ticks_ms()}ms Scores: {scores}')
702+
703+
704+
# By default, the MicroSpeech object uses the built-in audio preprocessor (float) and the
705+
# micro speech module for audio preprocessing and speech recognition, respectively. The
706+
# user can override both by passing two models:
707+
# MicroSpeech(preprocessor=ml.Model(...), micro_speech=ml.Model(...), labels=["label",...])
708+
speech = MicroSpeech()
709+
710+
# Starts the audio streaming and processes incoming audio to recognize speech commands.
711+
# If a callback is passed, listen() will loop forever and call the callback when a keyword
712+
# is detected. Alternatively, `listen()` can be called with a timeout (in ms), and it
713+
# returns if the timeout expires before detecting a keyword.
714+
speech.listen(callback=callback, threshold=0.70)
715+
```
716+
717+
In the example from above you can notice that there is no model defined explicitly, this is because it will use the default built-in model pre-trained to recognize the **yes** and **no** keywords.
718+
719+
You can run the script and say the keywords, if any is recognized, the *Serial Terminal* will print the heard word and the inference scores.
720+
721+
#### Custom Speech Recognition Model
722+
723+
You can easily run custom speech recognition models also. To show you how, we are going to replicate the **yes** and **no** example but this time using the `.tflite` model file.
694724

695725
First, download the `.tflite` [model](https://raw.githubusercontent.com/iabdalkader/microspeech-yesno-model/main/model.tflite) and copy it to the H7 local storage.
696726

697-
Use the following script to run the example. It can also be found on **File > Examples > Audio > micro_speech.py** in the OpenMV IDE.
727+
![Speech recognition model directory](assets/model-speech.png)
728+
729+
Copy and paste the following script based in the original example:
698730

699731
```python
700-
import audio
701732
import time
702-
import tf
703-
import micro_speech
704-
import pyb
733+
import ml
734+
from ml.apps import MicroSpeech
705735

706736
labels = ["Silence", "Unknown", "Yes", "No"]
707737

708-
led_red = pyb.LED(1)
709-
led_green = pyb.LED(2)
738+
def callback(label, scores):
739+
print(f'\nHeard: "{label}" @{time.ticks_ms()}ms Scores: {scores}')
710740

711-
model = tf.load("/model.tflite")
712-
speech = micro_speech.MicroSpeech()
713-
audio.init(channels=1, frequency=16000, gain_db=24, highpass=0.9883)
741+
speech = MicroSpeech(micro_speech=ml.Model('model.tflite', load_to_fb=True), labels=labels)
714742

715-
# Start audio streaming
716-
audio.start_streaming(speech.audio_callback)
743+
speech.listen(callback=callback, threshold=0.70)
744+
```
717745

718-
while True:
719-
# Run micro-speech without a timeout and filter detections by label index.
720-
idx = speech.listen(model, timeout=0, threshold=0.70, filter=[2, 3])
721-
led = led_green if idx == 2 else led_red
722-
print(labels[idx])
723-
for i in range(0, 4):
724-
led.on()
725-
time.sleep_ms(25)
726-
led.off()
727-
time.sleep_ms(25)
746+
As you can see, there are some differences between the original example from which we can highlight the following:
728747

729-
# Stop streaming
730-
audio.stop_streaming()
731-
```
732-
Now, just say `yes` or `no` and you will see the inference result in the OpenMV Serial Monitor.
748+
- The `ml` module is imported
749+
- A labels list was created including the model labels in a specific order
750+
- The `MicroSpeech()` function has been populated with the model and labels list as arguments.
751+
752+
Now, just say `yes` or `no` and you will see the inference result in the OpenMV Serial Terminal just as with the original example.
733753

734754
![Speech recognition example](assets/ml-inference.png)
735755

756+
***If you want to create a custom model `.tflite` file, you can do it with your own keywords or sounds using [Edge Impulse](https://docs.edgeimpulse.com/docs/edge-ai-hardware/mcu/arduino-portenta-h7).***
757+
736758
## Machine Learning Tool
737759

738760
The main features of the Portenta Vision Shield are the audio and video capabilities. This makes it a perfect option for almost infinite machine-learning applications.

0 commit comments

Comments
 (0)