You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/learning-paths/servers-and-cloud-computing/funASR/2_modelscope.md
+24-10Lines changed: 24 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,5 @@
1
1
---
2
-
title: ModelScope - Open-Source AI Pre-trained AI models hub
2
+
title: ModelScope - Open-Source Pre-trained AI models hub
3
3
weight: 3
4
4
5
5
### FIXED, DO NOT MODIFY
@@ -8,7 +8,7 @@ layout: learningpathall
8
8
9
9
## Before you begin
10
10
11
-
To follow the instructions for this Learning Path, you will need an Arm server running Ubuntu 22.04 LTS or later version with at least 8 cores, 16GB of RAM, and 30GB of disk storage.
11
+
To follow the instructions for this Learning Path, you will need an Arm based server running Ubuntu 22.04 LTS or later version with at least 8 cores, 16GB of RAM, and 30GB of disk storage.
12
12
13
13
## Introduce ModelScope
14
14
[ModelScope](https://github.com/modelscope/modelscope/) is an open-source platform that makes it easy to use AI models in your applications.
@@ -34,13 +34,13 @@ Arm provides optimized software and tools, such as Kleidi, to accelerate AI infe
34
34
You can learn more about [Faster PyTorch Inference using Kleidi on Arm Neoverse](https://community.arm.com/arm-community-blogs/b/servers-and-cloud-computing-blog/posts/faster-pytorch-inference-kleidi-arm-neoverse) from Arm community website.
35
35
36
36
37
-
## Installing ModelScope
37
+
## Install ModelScope and PyTorch
38
38
39
39
First, ensure your system is up-to-date and install the required tools and libraries:
This learning path will execute models on Arm Neoverse, so we only need to install the PyTorch CPU package.
65
+
In this learning path you will execute models on the Arm Neoverse CPU, so you will only need to install the PyTorch CPU package.
60
66
{{% /notice %}}
61
67
62
68
## Create a sample example
63
69
64
-
After completing the installation, we will use an example related to Chinese semantic understanding to illustrate how to use ModelScope.
70
+
You can now run an example to understand how to use ModelScope for understanding Chinese semantics.
65
71
66
72
There is a fundamental difference between Chinese and English writing.
67
73
The relationship between Chinese characters and their meanings is somewhat analogous to the difference between words and phrases in English.
@@ -70,9 +76,11 @@ Some Chinese characters, like English words, have clear meanings on their own, s
70
76
However, more often, Chinese characters need to be combined with other characters to express more complete meanings, just like phrases in English.
71
77
For example, “祝福” (blessing) can be broken down into “祝” (wish) and “福” (good fortune); “分享” (share) can be broken down into “分” (divide) and “享” (enjoy); “生成” (generate) is composed of “生” (produce) and “成” (become).
72
78
73
-
For computers to understand Chinese sentences, we need to understand the rules of Chinese characters, vocabulary, and grammar to accurately understand and express meaning.
79
+
For computers to understand Chinese sentences, you will need to understand the rules of Chinese characters, vocabulary, and grammar to accurately understand and express meaning.
80
+
81
+
In this simple example, you will use a general-domain Chinese [word segmentation model](https://www.modelscope.cn/models/iic/nlp_structbert_word-segmentation_chinese-base) to break down Chinese sentences into individual words, facilitating analysis and understanding by computers.
74
82
75
-
Here ia a simple example using a general-domain Chinese [word segmentation model](https://www.modelscope.cn/models/iic/nlp_structbert_word-segmentation_chinese-base), which can break down Chinese sentences into individual words, facilitating analysis and understanding by computers.
83
+
Using a file editor of your choice, copy the code shown below into a file named `segmentation.py`:
76
84
77
85
```python
78
86
from modelscope.pipelines import pipeline
@@ -84,7 +92,13 @@ result = word_segmentation(text)
84
92
print(result)
85
93
```
86
94
87
-
The output will be like this:
95
+
Run the model inference on the sample text:
96
+
97
+
```bash
98
+
python3 segmentation.py
99
+
```
100
+
101
+
The output should look like this:
88
102
```output
89
103
2025-01-28 00:30:29,692 - modelscope - WARNING - Model revision not specified, use revision: v1.0.3
90
104
Downloading Model to directory: /home/ubuntu/.cache/modelscope/hub/damo/nlp_structbert_word-segmentation_chinese-base
Copy file name to clipboardExpand all lines: content/learning-paths/servers-and-cloud-computing/funASR/3_funasr.md
+74-46Lines changed: 74 additions & 46 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,16 +13,16 @@ layout: learningpathall
13
13
## Installing FunASR
14
14
Install FunASR using pip:
15
15
```bash
16
-
pip3 install funasr
16
+
pip3 install funasr==1.2.3
17
17
```
18
18
{{% notice Note %}}
19
-
The following content is based on tests conducted using FunASR version 1.2.3. Variations may exist in different versions.
19
+
The learning path examples use FunASR version 1.2.3. You may notice minor differences in results with other versions.
20
20
{{% /notice %}}
21
21
22
-
## Performing Speech Recognition
23
-
FunASR offers a simple interface for performing speech recognition tasks. You can easily transcribe audio files or implement real-time speech recognition using FunASR's functionalities. In this learning path, you will learn how to leverage FunASR to implement speech recognition application.
22
+
## Speech Recognition
23
+
FunASR offers a simple interface for performing speech recognition tasks. You can easily transcribe audio files or implement real-time speech recognition using FunASR's functionalities. In this learning path, you will learn how to leverage FunASR to implement a speech recognition application.
24
24
25
-
Let's use a sample English speech voice as an example.
25
+
Let's use an English speech voice sample as an example to run audio transcription on. Copy the code shown below into a file named `funasr_test1.py`
26
26
27
27
```python
28
28
from funasr import AutoModel
@@ -36,37 +36,41 @@ res = model.generate(input="https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/Ma
36
36
print(f"\nResult: \n{res[0]['text']}")
37
37
```
38
38
39
-
Quick explain of the above Python code:
39
+
Before you run this script, lets look at what the Python code is doing:
40
40
41
-
AutoModel(): This is a class that provides an interface to load different AI models.
41
+
The imported `AutoModel()`class provides an interface to load different AI models.
42
42
43
43
***model="paraformer":**
44
44
45
-
Specifies the which model you'd like to load.
46
-
In this example you will load Paraformer model, which is an end-to-end automatic speech recognition (ASR) model designed for real-time transcription.
45
+
Specifies the model you would like to load.
46
+
In this example you will load the Paraformer model, which is an end-to-end automatic speech recognition (ASR) model designed for real-time transcription.
47
47
48
48
***device="cpu":**
49
49
50
-
Specify the model runs on the CPU (instead of a GPU).
50
+
Specify the model runs on the CPU. It does not require a GPU.
51
51
52
52
***hub="ms":**
53
53
54
54
Indicates that the model is sourced from the "ms" (ModelScope) hub.
55
55
56
-
model.generate(): This function processes an audio file and generates a transcribed text output.
56
+
The `model.generate()` function processes an audio file and generates a transcribed text output.
57
57
***input="...":**
58
58
59
-
The input is an audio file URL, which is a .wav file containing spoken content.
59
+
The input is an audio file URL, which is a .wav file containing an English audio sample.
60
60
61
-
62
-
Since the result contains a lot of information, to make it sample, we will only list the content of res[0]['text'].
61
+
The result contains a lot of information. To keep the example simple, you will only list the transcribed text contained in res[0]['text'].
63
62
64
63
In this initial test, a two-second English audio clip from the internet will be used for paraformer model to infleune the wave file.
65
64
66
-
Copy the Python code and execute in your Arm Neoverse, the result will looks like:
65
+
Run this Python script on your Arm based server:
67
66
68
-
```output
67
+
```bash
69
68
python funasr_test1.py
69
+
```
70
+
71
+
The output will look like:
72
+
73
+
```output
70
74
funasr version: 1.2.3.
71
75
Check update of funasr, and it would cost few times. You may disable it by set `disable_update=True` in AutoModel
72
76
You are using the latest version of funasr-1.2.3
@@ -91,9 +95,9 @@ Result:
91
95
he tried to think how it could be
92
96
```
93
97
94
-
The output shows "he tried to think how it could be" as expected.
98
+
The transcribed test shows "he tried to think how it could be". This is the expected result for the audio sample.
95
99
96
-
After understanding the basic usage, let's use a Chinese model.
100
+
Now lets try an example that uses a Chinese speech recognition model. Copy the code shown below in a file named `funasr_test2.py`:
97
101
98
102
```python
99
103
import os
@@ -111,16 +115,22 @@ res = model.generate(input=wav_file)
111
115
text_content = res[0]['text'].replace("","")
112
116
print(f"Result: \n{text_content}")
113
117
114
-
pring(res)
118
+
print(res)
115
119
```
116
120
117
-
You can see that the executed model has been replaced with a model that has a 'zh' suffix.
121
+
You can see that the loaded model has been replaced with a Chinese speech recognition model that has a `-zh` suffix.
118
122
119
-
FunASR will recognise each sound in the speech with appropriate character recognition.
123
+
FunASR will process each sound in the audio with appropriate character recognition.
120
124
121
-
We'd like to slightly modify the output format. In addition to recognizing Chinese characters, we'll also add timestamps indicating the start and end times of each character. This will facilitate applications such as subtitle generation and sentiment analysis.
125
+
You have also modified the output format from the previous example. In addition to recognising the Chinese characters, you will add timestamps indicating the start and end times of each character. This is used for applications like subtitle generation and sentiment analysis.
122
126
123
-
The output should be looks like:
127
+
Run the Python script:
128
+
129
+
```bash
130
+
python3 funasr_test2.py
131
+
```
132
+
133
+
The output should look like:
124
134
125
135
```output
126
136
Downloading Model to directory: /home/ubuntu/.cache/modelscope/hub/iic/speech_seaco_paraformer_large_asr_nat-zh-cn-16k-common-vocab8404-pytorch
@@ -137,13 +147,14 @@ Result:
137
147
138
148
The output shows "欢迎大家来到达摩社区进行体验" as expected.
139
149
140
-
You can also observe that the spacing between the third and sixth characters is very short. This is because they are combined with other characters, as discussed in the previous session.
150
+
You can also observe that the spacing between the third and sixth characters is very short. This is because they are combined with other characters, as discussed in the previous section.
141
151
142
-
The speech processing pipeline can be conceptualized as follows: the output of the speech recognition module serves as the input for the semantic segmentation model, enabling us to validate the accuracy of the recognized results.
152
+
You can now build a speech processing pipeline. The output of the speech recognition module serves as the input for the semantic segmentation model, enabling you to validate the accuracy of the recognized results. Copy the code shown below in a file named `funasr_test3.py`:
Good, the result exactly what we were looking for.
215
+
Good, the result is exactly what you are looking for.
200
216
201
217
## Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition
202
218
203
-
Now, I'd like to introduce more advance speech recognition model, [Paraformer](https://aclanthology.org/2020.wnut-1.18/).
219
+
Lets now look at a more advanced speech recognition model, [Paraformer](https://aclanthology.org/2020.wnut-1.18/).
204
220
205
221
Paraformer is a novel architecture for automatic speech recognition (ASR) that offers both enhanced speed and accuracy compared to traditional models. Its key innovation lies in its parallel transformer design, enabling simultaneous processing of multiple parts of the input speech. This parallel processing capability leads to significantly faster inference, making Paraformer well-suited for real-time ASR applications where responsiveness is crucial.
206
222
207
223
Furthermore, Paraformer has demonstrated state-of-the-art accuracy on several benchmark datasets, showcasing its effectiveness in accurately transcribing speech. This combination of speed and accuracy makes Paraformer a promising advancement in the field of ASR, opening up new possibilities for high-performance speech recognition systems.
208
224
209
-
Paraformer has been fully integrated into FunASR. Here is a sample program.
225
+
Paraformer has been fully integrated into FunASR. Copy the sample program shown below into a file named `paraformer.py`.
210
226
211
227
This example uses PyTorch-optimized Paraformer model from ModelScope, the program will first check if the test audio file has been downloaded.
When you execute the code, the output will looks like:
265
+
The output should look like:
245
266
246
267
```output
247
268
2025-01-28 00:03:24,373 - modelscope - INFO - Use user-specified model revision: v2.0.4
@@ -273,17 +294,15 @@ The output shows "飞机穿过云层眼下一片云海有时透过稀薄的云
273
294
274
295
## Punctuation Restoration
275
296
276
-
In the previous example, the speech of each word was correctly recognized, but the lack of punctuation hindered understanding the speaker's intended expression.
297
+
In the previous example, the speech of each word was correctly recognized, but it lacked punctuation. The lack of punctuation hinders our understanding of the speaker's intended expression.
277
298
278
-
Therefore, we can add a [Punctuation Restoration model](https://aclanthology.org/2020.wnut-1.18/) responsible for punctuation as the next stage in the audio workload.
299
+
You can add a [Punctuation Restoration model](https://aclanthology.org/2020.wnut-1.18/) responsible for punctuation as the next step in processing your audio workload.
279
300
280
-
In the example above, we continue to use the Paraformer model from the previous example and add two ModelScope's model:
301
+
In addition to using the Paraformer model, you will add two more ModelScope models:
281
302
- VAD ([Voice Activity Detection](https://modelscope.cn/models/iic/speech_fsmn_vad_zh-cn-16k-common-pytorch/summary)) and
by add `vad_model` and `punc_model` parameters in the later stages.
285
-
286
-
This way, we can obtain punctuation that matches the semantics of the speech recognition.
305
+
This way, you can obtain punctuation that matches the semantics of the speech recognition. Copy the updated code shown below in a file named `paraformer-2.py`:
Simply translate this recognized result, and you can easily see that the four sentences represent different meanings.
363
+
Lets translate this recognized result, and you can easily see that the four sentences represent different meanings.
339
364
340
365
"飞机穿过云层" means: The airplane passed through the clouds.
341
366
@@ -348,11 +373,11 @@ Simply translate this recognized result, and you can easily see that the four se
348
373
349
374
350
375
## Sentiment Analysis
351
-
FunASR also supports sentiment analysis of speech, allowing you to determine the emotional tone of spoken language.
376
+
FunASR also supports sentiment analysis of speech, allowing you to determine the emotional tone of the spoken language.
352
377
353
378
This can be valuable for applications like customer service and social media monitoring.
354
379
355
-
We use a mature speech emotion recognition model [emotion2vec+](https://modelscope.cn/models/iic/emotion2vec_plus_large)on ModelScope as an example.
380
+
You can use a mature speech emotion recognition model [emotion2vec+](https://modelscope.cn/models/iic/emotion2vec_plus_large)from ModelScope as an example.
356
381
357
382
The model will identify which of the following emotions is the closest match for the emotion expressed in the speech:
358
383
- Neutral
@@ -361,6 +386,7 @@ The model will identify which of the following emotions is the closest match for
361
386
- Angry
362
387
- Unknow
363
388
389
+
Copy the code shown below in a file named `sentiment.py`:
## Best Price-Performance for ASR on Arm Neoverse N2
431
-
Arm CPUs, with their high performance and low power consumption, provide an ideal platform for running ModelScope's AI models, especially in edge computing scenarios. Arm's comprehensive software ecosystem supports the development and deployment of ModelScope models, enabling developers to create innovative and efficient applications.
432
-
You can learn more about [Kleidi Technology Delivers Best Price-Performance for ASR on Arm Neoverse N2](https://community.arm.com/arm-community-blogs/b/servers-and-cloud-computing-blog/posts/neoverse-n2-delivers-leading-price-performance-on-asr) from the Arm community blog.
433
-
434
462
## Conclusion
435
463
ModelScope and FunASR empower developers to build robust Chinese ASR applications. By leveraging the strengths of Arm CPUs and the optimized software ecosystem, developers can create innovative and efficient solutions for various use cases. Explore the capabilities of ModelScope and FunASR, and unlock the potential of Arm technology for your next Chinese ASR project.
0 commit comments