Skip to content

Commit 71eea39

Browse files
authored
Merge pull request #1674 from madeline-underwood/FunASR
Fun ASR_Andy to review
2 parents 93fd610 + f10a3d1 commit 71eea39

File tree

4 files changed

+37
-34
lines changed

4 files changed

+37
-34
lines changed

content/learning-paths/servers-and-cloud-computing/funASR/1_asr.md

Lines changed: 15 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: Introduction to Automatic Speech Recognition (ASR)
2+
title: Introduction to Automatic Speech Recognition
33
weight: 2
44

55
### FIXED, DO NOT MODIFY
@@ -19,19 +19,24 @@ At its core, ASR transforms spoken language into written text. Despite seeming s
1919
ASR is used in myriad applications across various domains:
2020

2121
* Virtual Assistants and Chatbots - ASR enables natural language interactions in chatbots and virtual assistants, enhancing customer support, information retrieval, and even entertainment.
22-
Use case: a developer can use ASR to create a voice-controlled chatbot that helps users troubleshoot technical issues or navigate complex software applications.
2322

24-
* Data Analytics and Business Intelligence - Leveraging ASR to analyze customer interactions, such as calls and surveys, to uncover trends and enhance services.
25-
Use case: an app that transcribes customer service calls and applies sentiment analysis to pinpoint areas for improving customer satisfaction.
23+
{{% notice Use case %}}A developer can use ASR to create a voice-controlled chatbot that helps users troubleshoot technical issues or navigate complex software applications.{{% /notice %}}
2624

27-
* ASR-powered Accessibility Tools - Developing ASR-powered tools to enhance accessibility for users with disabilities. Applications include voice-controlled interfaces, real-time captioning, and text-to-speech conversion.
28-
Use case: a developer can create a tool that uses ASR to generate real-time captions for online meetings or presentations, making them accessible to deaf and hard-of-hearing individuals.
25+
* Data Analytics and Business Intelligence - leveraging ASR to analyze customer interactions, such as calls and surveys, to uncover trends and enhance services.
2926

30-
* Integrating ASR into Software Development Tools - Enhancing efficiency and productivity by incorporating ASR into development environments. This can include voice commands for code editing, debugging, or version control.
31-
Use case: a developer can build an IDE plugin that enables voice-controlled coding, file navigation, and test execution, streamlining the workflow and reducing reliance on manual input.
27+
{{% notice Use case %}}An app that transcribes customer service calls and applies sentiment analysis to pinpoint areas for improving customer satisfaction.{{% /notice %}}
3228

33-
* Smart Homes and IoT with ASR - Enhancing convenience with voice control of smart home devices and appliances.
34-
Use case: a voice-activated home automation system that lets users control lighting, temperature, and entertainment systems with natural language commands.
29+
* ASR-powered Accessibility Tools - developing ASR-powered tools to enhance accessibility for users with disabilities. Applications include voice-controlled interfaces, real-time captioning, and text-to-speech conversion.
30+
31+
{{% notice Use case %}}A developer can create a tool that uses ASR to generate real-time captions for online meetings or presentations, making them accessible to deaf and hard-of-hearing individuals.{{% /notice %}}
32+
33+
* Integrating ASR into Software Development Tools - enhancing efficiency and productivity by incorporating ASR into development environments. This can include voice commands for code editing, debugging, or version control.
34+
35+
{{% notice Use case %}}A developer can build an IDE plugin that enables voice-controlled coding, file navigation, and test execution, streamlining the workflow and reducing reliance on manual input.{{% /notice %}}
36+
37+
* Smart Homes and IoT with ASR - enhancing convenience with voice control of smart home devices and appliances.
38+
39+
{{% notice Use case %}}A voice-activated home automation system that lets users control lighting, temperature, and entertainment systems with natural language commands.{{% /notice %}}
3540

3641
### Challenges in ASR
3742

content/learning-paths/servers-and-cloud-computing/funASR/2_modelscope.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: ModelScope - Open-Source Pre-trained AI models hub
2+
title: ModelScope - an Open Source Pre-trained AI Models Hub
33
weight: 3
44

55
### FIXED, DO NOT MODIFY
@@ -8,18 +8,18 @@ layout: learningpathall
88

99
## Before you begin
1010

11-
To follow the instructions for this Learning Path, you will need an Arm-based server running Ubuntu 22.04 LTS or later, with at least 8 cores, 16GB of RAM, and 30GB of disk storage.
11+
Before you begin following the instructions for this Learning Path, make sure you have an Arm-based server running Ubuntu 22.04 LTS or later, with at least 8 cores, 16GB of RAM, and 30GB of disk storage.
1212

1313
## What is ModelScope?
1414
[ModelScope](https://github.com/modelscope/modelscope/) is an open-source platform designed to simplify the integration of AI models into applications. It offers a wide variety of pre-trained models for tasks such as image recognition, natural language processing, and audio analysis. With ModelScope, you can seamlessly integrate these models into your projects using just a few lines of code.
1515

1616
Key benefits of ModelScope:
1717

18-
* Model Diversity - Access a wide range of models for various tasks, including Automatic Speech Recognition (ASR), natural language processing (NLP), and computer vision.
18+
* Model Diversity - access a wide range of models for various tasks, including Automatic Speech Recognition (ASR), natural language processing (NLP), and computer vision.
1919

2020
* Ease of Use - ModelScope provides a user-friendly interface and APIs that enable seamless model integration.
2121

22-
* Community Support - Benefit from a vibrant community of developers and researchers who actively contribute to and support ModelScope.
22+
* Community Support - benefit from a vibrant community of developers and researchers who actively contribute to and support ModelScope.
2323

2424

2525
## Arm CPU Acceleration
@@ -94,7 +94,7 @@ result = word_segmentation(text)
9494
print(result)
9595
```
9696

97-
This piece of code specifies a model and provides a Chinese sentence for the model to segment.
97+
This piece of code specifies a model and provides a Chinese sentence for the model to segment;
9898
"A New Year’s greeting message to share with everyone."
9999

100100
Run the model inference on the sample text:

content/learning-paths/servers-and-cloud-computing/funASR/3_funasr.md

Lines changed: 12 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -147,7 +147,7 @@ Result:
147147

148148
The output shows "欢迎大家来到达摩社区进行体验" which means "Welcome everyone to the Dharma community to explore and experience!" as expected.
149149

150-
You can also observe that the spacing between the third and sixth characters is very short. This is because they are combined with other characters, as discussed in the previous section.
150+
You can also see that the spacing between the third and sixth characters is short. This is because they are combined with other characters, as discussed in the previous section.
151151

152152
You can now build a speech processing pipeline. The output of the speech recognition module serves as the input for the semantic segmentation model, enabling you to validate the accuracy of the recognized results. Copy the code shown below in a file named `funasr_test3.py`:
153153

@@ -223,9 +223,7 @@ Good, the result is exactly what you are looking for.
223223

224224
## Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition
225225

226-
Now you can look at a more advanced speech recognition model, [Paraformer](https://aclanthology.org/2020.wnut-1.18/).
227-
228-
Paraformer is a novel architecture for automatic speech recognition (ASR) designed for both speed and accuracy. Unlike traditional models, it leverages a parallel transformer architecture, enabling simultaneous processing of multiple parts of the input speech. This parallel processing capability leads to significantly faster inference, making Paraformer well-suited for real-time ASR applications where responsiveness is crucial.
226+
[Paraformer](https://aclanthology.org/2020.wnut-1.18/) is a novel architecture for automatic speech recognition (ASR) designed for both speed and accuracy. Unlike traditional models, it leverages a parallel transformer architecture, enabling simultaneous processing of multiple parts of the input speech. This parallel processing capability leads to significantly faster inference, making Paraformer well-suited for real-time ASR applications where responsiveness is crucial.
229227

230228
Furthermore, Paraformer has demonstrated state-of-the-art accuracy on several benchmark datasets, showcasing its effectiveness in accurately transcribing speech. This combination of speed and accuracy makes Paraformer a promising advancement in the field of ASR, opening up new possibilities for high-performance speech recognition systems.
231229

@@ -504,10 +502,10 @@ python --version
504502
```
505503

506504
{{% notice Note %}}
507-
The update-alternatives command is a Debian-based Linux utility (used in Ubuntu, Debian, etc.) for managing symbolic links to different versions of software alternatives. It allows you to easily switch between multiple installed versions of the same program.
505+
The `update-alternatives` command is a utility in Debian-based Linux distributions, such as Ubuntu and Debian, that manages symbolic links for different software versions. It simplifies the process of switching between multiple installed versions of the same program.
508506
{{% /notice %}}
509507

510-
The Python version should be 3.10 now.
508+
The Python version should now be 3.10.
511509

512510
```output
513511
Python 3.10.16
@@ -541,19 +539,19 @@ pip uninstall torchao && cd ao/ && rm -rf build && python setup.py install
541539
```
542540

543541
{{% notice Note %}}
544-
Please reference this [link](https://learn.arm.com/learning-paths/servers-and-cloud-computing/pytorch-llama/pytorch-llama/) to learn more the detail description of those instructions.
542+
See this Learning Path [Run a Large Language Model Chatbot on Arm servers](https://learn.arm.com/learning-paths/servers-and-cloud-computing/pytorch-llama/pytorch-llama/) for further information.
545543
{{% /notice %}}
546544

547-
Once you have installed the optimized PyTorch, we can enabled bfloat16 fast math kernels by setting DNNL_DEFAULT_FPMATH_MODE.
545+
Once you have installed the optimized PyTorch, enable bfloat16 fast math kernels by setting DNNL_DEFAULT_FPMATH_MODE.
548546

549-
On AWS Graviton3 as example, this enables GEMM kernels that use bfloat16 MMLA instructions available in the hardware.
547+
Using AWS Graviton3 as an example, this enables GEMM kernels that use bfloat16 MMLA instructions available in the hardware.
550548

551549

552550
### Update the FunASR application with benchmark function
553551

554-
Now we can test FunASR model again.
552+
Now you can test the FunASR model again.
555553

556-
By re-use the previously paraformer-2.py and add benchmark function, please copy the updated code shown below in a file named `paraformer-3.py`:
554+
By reusing the previously-named `paraformer-2.py` file and add benchmark function, copy the updated code shown below in a file named `paraformer-3.py`:
557555

558556
```python
559557
import os
@@ -650,7 +648,7 @@ The model took 0.816 seconds to complete execution.
650648

651649
### Enable bfloat16 Fast Math Kernels
652650

653-
Now we enable bfloat16 and run Python script again:
651+
Now enable bfloat16 and run Python script again:
654652

655653
```bash
656654
export DNNL_DEFAULT_FPMATH_MODE=BF16
@@ -665,7 +663,7 @@ rtf_avg: 0.010: 100%|
665663
Execution Time: 738.04 ms (Mean), 747.57 ms (P99)
666664
```
667665

668-
You can notice that the execution time is now 0.7 seconds, reflecting an improvement compared to earlier results.
666+
Here you can see that the execution time is now 0.7 seconds, reflecting an improvement compared to earlier results.
669667

670668
## Conclusion
671-
Arm CPUs and an optimized software ecosystem enable developers to build innovative, efficient ASR solutions. Explore the capabilities of ModelScope and FunASR, and unlock the potential of Arm technology for your next Chinese ASR project.
669+
Arm CPUs paired with an optimized software ecosystem enable developers to build innovative, efficient ASR solutions. Discover the potential of ModelScope and FunASR, and harness Arm technology for your next Chinese ASR project.

content/learning-paths/servers-and-cloud-computing/funASR/_index.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,20 @@
11
---
2-
title: Deploy ModelScope FunASR Chinese Speech Recognition Model on Arm Servers
2+
title: Deploy ModelScope FunASR Model on Arm Servers
33
draft: true
44
cascade:
55
draft: true
66

77
minutes_to_complete: 60
88

9-
who_is_this_for: This is an introductory topic for software developers and AI engineers interested in learning how to run Chinese Automatic Speech Recognition (ASR) applications on Arm servers.
9+
who_is_this_for: This is an introductory topic for developers interested in learning how to deploy the ModelScope FunASR Chinese Automatic Speech Recognition (ASR) model on Arm-based servers.
1010

1111
learning_objectives:
1212
- Leverage open-source large language models and tools to build Chinese ASR applications.
13-
- Deploy real-time Chinese speech recognition, punctuation restoration, and sentiment analysis with FunASR.
14-
- Describe how to accelerate ModelScope models on Arm-based servers for performance and efficiency.
13+
- Deploy real-time Chinese speech recognition, punctuation restoration, and sentiment analysis using FunASR.
14+
- Describe how to accelerate ModelScope models on Arm-based servers for enhanced performance and efficiency.
1515

1616
prerequisites:
17-
- An [Arm-based instance](/learning-paths/servers-and-cloud-computing/csp/) from a cloud service provider, or a local Arm Linux computer with at least 8 CPUs and 16GB RAM.
17+
- An [Arm-based instance](/learning-paths/servers-and-cloud-computing/csp/) from a cloud service provider, or a local Arm Linux computer with at least 8 CPUs and 16GB of RAM.
1818

1919
author: Odin Shen
2020

0 commit comments

Comments
 (0)