|
| 1 | +--- |
| 2 | +title: MaixCAM MaixPy Deploy online speech recognition |
| 3 | +update: |
| 4 | + - date: 2024-12-23 |
| 5 | + author: lxowalle |
| 6 | + version: 1.0.0 |
| 7 | + content: Initial document |
| 8 | +--- |
| 9 | + |
| 10 | +## Introduction |
| 11 | + |
| 12 | +Deploying online speech recognition locally is a solution for real-time processing of speech input. By running a speech recognition model on a local server and interacting with `MaixCAM`, it enables instant processing and result return of audio data without relying on external cloud services. This approach not only improves response speed but also protects user privacy, making it ideal for applications requiring high data security and real-time performance, such as smart hardware, industrial control, and real-time subtitle generation. |
| 13 | + |
| 14 | +This document uses the open-source framework [`sherpa-onnx`](https://github.com/k2-fsa/sherpa-onnx) for deployment. `sherpa-onnx` is a subproject of `sherpa`, supporting various tasks like streaming and non-streaming speech recognition, text-to-speech, speaker classification, speaker recognition, speaker verification, and spoken language recognition. Below, we mainly introduce how to achieve streaming speech recognition using `MaixCAM` and `sherpa-onnx`. |
| 15 | + |
| 16 | +> Note: Streaming speech recognition features high real-time performance, allowing recognition during speech. It is commonly used in real-time translation and voice assistants. Non-streaming recognition requires processing a complete sentence at a time and is known for its high accuracy. |
| 17 | +
|
| 18 | +## Deploying the Speech Recognition Server |
| 19 | + |
| 20 | +`sherpa-onnx` supports deployment in multiple languages, including `C/C++`, `Python`, `Java`, and more. For simplicity, we will use `Python` for deployment. If you encounter any issues during the process, you can refer to the `sherpa` [documentation](https://k2-fsa.github.io/sherpa/intro.html). Let's get started! |
| 21 | + |
| 22 | + |
| 23 | +#### Download the `sherpa-onnx` Repository |
| 24 | + |
| 25 | +```shell |
| 26 | +git clone https://github.com/k2-fsa/sherpa-onnx.git |
| 27 | +``` |
| 28 | + |
| 29 | +#### Install Dependencies |
| 30 | + |
| 31 | +```python |
| 32 | +pip install numpy |
| 33 | +pip install websockets |
| 34 | +``` |
| 35 | + |
| 36 | +#### Install the `sherpa-onnx` Package |
| 37 | + |
| 38 | +```python |
| 39 | +pip install sherpa-onnx |
| 40 | +``` |
| 41 | + |
| 42 | +If GPU support is required, install the CUDA-enabled package: |
| 43 | + |
| 44 | +```python |
| 45 | +pip install sherpa-onnx==1.10.16+cuda -f https://k2-fsa.github.io/sherpa/onnx/cuda.html |
| 46 | + |
| 47 | +# For users in China |
| 48 | +# pip install sherpa-onnx==1.10.16+cuda -f https://k2-fsa.github.io/sherpa/onnx/cuda-cn.html |
| 49 | +``` |
| 50 | + |
| 51 | +If the package is unavailable or installation fails, build and install from the source: |
| 52 | + |
| 53 | +```python |
| 54 | +cd sherpa-onnx |
| 55 | +export SHERPA_ONNX_CMAKE_ARGS="-DSHERPA_ONNX_ENABLE_GPU=ON" |
| 56 | +python3 setup.py install |
| 57 | +``` |
| 58 | + |
| 59 | +If a GPU is available but `CUDA` is not installed, refer to the installation guide [`here`](https://k2-fsa.github.io/k2/installation/cuda-cudnn.html) |
| 60 | + |
| 61 | +#### Verify the Installation of `sherpa-onnx` |
| 62 | + |
| 63 | +```python |
| 64 | +python3 -c "import sherpa_onnx; print(sherpa_onnx.__version__)" |
| 65 | + |
| 66 | +# Expected output: |
| 67 | +# sherpa-onnx or 1.10.16+cuda |
| 68 | +``` |
| 69 | + |
| 70 | +#### Download the Model |
| 71 | + |
| 72 | +[`Zipformer Bilingual Model for Mandarin and English:sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20-mobile`](https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/.tar.bz2) |
| 73 | + |
| 74 | +[`Paraformer Trilingual Model for Mandarin, Cantonese, and English:sherpa-onnx-streaming-paraformer-trilingual-zh-cantonese-en`](https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-streaming-paraformer-trilingual-zh-cantonese-en.tar.bz2) |
| 75 | + |
| 76 | +> Note: |
| 77 | +> For Chinese recognition, it is recommended to use the `sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20-mobile` model |
| 78 | +> |
| 79 | +> For English recognition, it is recommended to use the `sherpa-onnx-streaming-paraformer-trilingual-zh-cantonese-en` model |
| 80 | +
|
| 81 | +#### Run the Server |
| 82 | + |
| 83 | +`sherpa-onnx` provides a server example, so there's no need to write additional code. Follow these steps to start the server. |
| 84 | + |
| 85 | +##### Run the `zipformer` Model |
| 86 | + |
| 87 | +```shell |
| 88 | +cd sherpa-onnx |
| 89 | +export MODEL_PATH="sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20" |
| 90 | +python3 ./python-api-examples/streaming_server.py \ |
| 91 | + --encoder ./${MODEL_PATH}/encoder-epoch-99-avg-1.onnx \ |
| 92 | + --decoder ./${MODEL_PATH}/decoder-epoch-99-avg-1.onnx \ |
| 93 | + --joiner ./${MODEL_PATH}/joiner-epoch-99-avg-1.onnx \ |
| 94 | + --tokens ./${MODEL_PATH}/tokens.txt \ |
| 95 | + --provider "cuda" |
| 96 | +``` |
| 97 | + |
| 98 | +##### Run the `paraformer` Model |
| 99 | + |
| 100 | +```shell |
| 101 | +cd sherpa-onnx |
| 102 | +export MODEL_PATH="sherpa-onnx-streaming-paraformer-trilingual-zh-cantonese-en" |
| 103 | +python3 ./python-api-examples/streaming_server.py \ |
| 104 | + --paraformer-encoder ./${MODEL_PATH}/encoder.onnx \ |
| 105 | + --paraformer-decoder ./${MODEL_PATH}/decoder.onnx \ |
| 106 | + --tokens ./${MODEL_PATH}/tokens.txt \ |
| 107 | + --provider "cuda" |
| 108 | + |
| 109 | +``` |
| 110 | + |
| 111 | +##### Example Log Output |
| 112 | + |
| 113 | +```shell |
| 114 | +2024-12-23 09:25:17,557 INFO [streaming_server.py:667] No certificate provided |
| 115 | +2024-12-23 09:25:17,561 INFO [server.py:715] server listening on [::]:6006 |
| 116 | +2024-12-23 09:25:17,561 INFO [server.py:715] server listening on 0.0.0.0:6006 |
| 117 | +2024-12-23 09:25:17,561 INFO [streaming_server.py:693] Please visit one of the following addresses: |
| 118 | + |
| 119 | + http://localhost:6006 |
| 120 | + |
| 121 | +Since you are not providing a certificate, you cannot use your microphone from within the browser using public IP addresses. Only localhost can be used.You also cannot use 0.0.0.0 or 127.0.0.1 |
| 122 | +``` |
| 123 | + |
| 124 | +At this point, the ASR model server is up and running. |
| 125 | + |
| 126 | +#### Communication Between `MaixCAM` and the Server |
| 127 | + |
| 128 | +For brevity, example client code is provided via the following links. Note that most cases require audio data with a sampling rate of 16000Hz and a single channel: |
| 129 | + |
| 130 | +[`MaixCAMMaixCAM` Streaming Recognition](https://github.com/sipeed/MaixPy/blob/main/examples/audio/asr/asr_streaming_websockt_client) |
| 131 | + |
| 132 | +[`MaixCAM` Non-Streaming Recognition](https://github.com/sipeed/MaixPy/blob/main/examples/audio/asr/asr_non_streaming_websockt_client) |
| 133 | + |
| 134 | +```shell |
| 135 | +# Update server address |
| 136 | +SERVER_ADDR = "127.0.0.1" |
| 137 | +SERVER_PORT = 6006 |
| 138 | +``` |
| 139 | + |
| 140 | +After updating the server address and port, use maixvision to run the client. If using the streaming recognition script, try interacting with MaixCAM. |
| 141 | + |
| 142 | +> Note: This document does not elaborate on the communication protocol because it is straightforward—essentially raw data exchange via WebSocket. It is recommended to first experience the setup and then delve into the code for further details. |
| 143 | +
|
| 144 | +The deployment process is now complete. |
0 commit comments