This application performs a speech-to-text transcription using OpenAI's Whisper-tiny and Whisper-base model on the Hailo-8/8L/10H AI accelerators.
Ensure your system matches the following requirements before proceeding:
- Platforms tested: x86, Raspberry Pi 5
- OS: Ubuntu 22 (x86) or Raspberry OS.
- HailoRT 4.20 or 4.21 and the corresponding PCIe driver must be installed. You can download them from the Hailo Developer Zone
- ffmpeg and libportaudio2 installed for audio processing.
sudo apt update sudo apt install ffmpeg sudo apt install libportaudio2
- Python 3.10 or 3.11 installed.
Follow these steps to set up the environment and install dependencies for inference:
-
Clone this repository:
git clone https://github.com/hailo-ai/Hailo-Application-Code-Examples.git cd Hailo-Application-Code-Examples/runtime/hailo-8/python/speech_recognition
If you have any authentication issues, add your SSH key or download the zip.
-
Run the setup script to install dependencies:
python3 setup.py
-
Activate the virtual environment from the repository root folder:
source whisper_env/bin/activate
-
Install PyHailoRT inside the virtual environment (must be downloaded from the Hailo Developer Zone), for example:
pip install hailort-4.20.0-cp310-cp310-linux_x86_64.whl
The PyHailoRT version must match the installed HailoRT version. NOTE: This step is not necessary for Raspberry Pi 5 users who installed the hailo-all package, since the venv will inherit the system package.
- Make sure you have a microphone connected to your system. If you have multiple microphones connected, please make sure the proper one is selected in the system configuration, and that the input volume is set to a medium/high level.
A good quality microphone (or a USB camera) is suggested to acquire the audio. - The application allows the user to acquire and process an audio sample up to 5 seconds long. The duration can be modified in the application code.
- The current pipeline supports English language only.
-
Activate the virtual environment from the repository root folder:
source whisper_env/bin/activate
-
Run the command line app (from the root folder)
python3 -m app.app_hailo_whisper
The app uses Hailo-8 models as default. If you have an Hailo-8L device, run the following command instead:
python3 -m app.app_hailo_whisper --hw-arch hailo8l
If you want to select a specific Whisper model, use the --variant argument:
python3 -m app.app_hailo_whisper --variant base python3 -m app.app_hailo_whisper --variant tiny
Use the python3 -m app.app_hailo_whisper --help
command to print the helper.
The following command line options are available:
- --reuse-audio: Reloads the audio from the previous run.
- --hw-arch: Selects the Whisper models compiled for the target architecture (hailo8 / hailo8l / hailo10h). If not specified, the hailo8 architecture is selected.
- --variant: Variant of the Whisper model to use (tiny / base). If not specified, the base model is used.
- --multi-process-service: Enables the multi-process service, to run other models on the same chip in addition to Whisper
This version includes several performance optimizations for real-time speech-to-text processing:
- Reduced Chunk Length: Optimized audio chunk processing for faster response times
- Zero-Copy Memory Management: Minimized memory allocations and copies for better performance
- Multi-Process Support: Enabled running multiple models concurrently on Hailo devices
- Fast Mode Option: Reduced accuracy for speed with the
--fast-mode
parameter - Streaming Output: Character-by-character output streaming with the
--stream-output
parameter - Timing Analysis: Performance profiling with the
--timing
parameter