Skip to content

Commit 668537d

Browse files
authored
Merge pull request #1584 from odincodeshen/lp_wop_cloud
LP on FunASR
2 parents 1c17e70 + d5d4889 commit 668537d

File tree

5 files changed

+694
-0
lines changed

5 files changed

+694
-0
lines changed
Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
---
2+
title: Introduction to Automatic Speech Recognition (ASR)
3+
weight: 2
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
## What is ASR?
10+
11+
Automatic Speech Recognition [ASR](https://en.wikipedia.org/wiki/Speech_recognition), also known as speech-to-text, is a rapidly evolving field that empowers computers to understand and transcribe human speech.
12+
13+
This technology has become deeply integrated into our daily lives, powering a wide range of applications and services we often take for granted, many of which are optimized for and run on Arm CPU architecture.
14+
15+
At its core, ASR involves converting spoken audio into written text. While seemingly simple, this process is quite complex, requiring sophisticated algorithms and models to accurately interpret the nuances of human speech, including variations in pronunciation, accents, and background noise.
16+
17+
### Key Applications of ASR
18+
19+
ASR is now used in a myriad of applications across various domains:
20+
21+
* **Virtual Assistants and Chatbots:**
22+
23+
Integrating ASR into chatbots and virtual assistants to enable natural language interactions with users. This can be used for customer support, information retrieval, or even entertainment.
24+
For xample: A developer could use ASR to create a voice-controlled chatbot that helps users troubleshoot technical issues or navigate complex software applications.
25+
26+
* **Data Analytics and Business Intelligence:**
27+
28+
Analyzing customer interactions (e.g., calls, surveys) to identify trends and improve services.
29+
For example: An app that transcribes customer service calls and uses sentiment analysis to identify areas where customer satisfaction can be improved.
30+
31+
* **Accessibility Tools:**
32+
33+
Developing tools that use ASR to improve accessibility for users with disabilities. This could include voice-controlled interfaces, real-time captioning, or text-to-speech conversion.
34+
For example: A developer could create a tool that uses ASR to generate real-time captions for online meetings or presentations, making them accessible to deaf or hard-of-hearing participants.
35+
36+
* **Software Development Tools:**
37+
38+
Integrating ASR into software development tools to improve efficiency and productivity. This could include voice commands for code editing, debugging, or version control.
39+
For example: A developer could create a plugin for their IDE that allows them to use voice commands to write code, navigate through files, or run tests.
40+
41+
* **Smart Homes and IoT:**
42+
Voice control of smart home devices and appliances for a more convenient user experience.
43+
For example: voice-activated home automation system that allows users to control lighting, temperature, and entertainment systems with natural language commands.
44+
45+
46+
### Challenges in ASR
47+
48+
While the potential applications of ASR are vast and inspiring, it's important to acknowledge the inherent challenges in developing and deploying accurate and reliable ASR systems. These challenges stem from the complexities of human speech, environmental factors, and the intricacies of language itself. These challenges are particularly pronounced for Chinese ASR, which needs to address unique linguistic characteristics such as:
49+
50+
* **Complexities of Chinese Language:**
51+
Mandarin Chinese involves tonal variations where the meaning of a syllable changes depending on its tone, and punctuation is crucial to convey meaning and avoid ambiguity. Accurately recognizing these nuances is essential for understanding spoken Chinese.
52+
53+
* **Noise Robustness:**
54+
ASR systems need to be able to filter out background noise to accurately transcribe speech. This is particularly challenging in noisy environments like crowded streets or busy offices.
55+
56+
* **Dialectal Diversity:**
57+
Chinese encompasses numerous dialects with significant variations in pronunciation and vocabulary. This poses a challenge for ASR systems to generalize across different regions and speakers.
58+
59+
* **Homophones:**
60+
Chinese has a high prevalence of homophones, words that sound alike but have different meanings. Disambiguating these homophones requires understanding the context and semantics of the spoken words.
61+
62+
In the following sections, we'll explore one such solution that leverages the power of ModelScope and Arm CPUs to enable efficient and accurate Chinese ASR.
Lines changed: 131 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
---
2+
title: ModelScope - Open-Source AI Pre-trained AI models hub
3+
weight: 3
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
## Before you begin
10+
11+
To follow the instructions for this Learning Path, you will need an Arm server running Ubuntu 22.04 LTS or later version with at least 8 cores, 16GB of RAM, and 30GB of disk storage.
12+
13+
## Introduce ModelScope
14+
[ModelScope](https://github.com/modelscope/modelscope/) is an open-source platform that makes it easy to use AI models in your applications.
15+
It provides a wide variety of pre-trained models for tasks like image recognition, natural language processing, and audio analysis. With ModelScope, you can easily integrate these models into your projects with just a few lines of code.
16+
17+
Key benefits of ModelScope include:
18+
19+
* **Model Diversity:**
20+
Access a wide range of models for various tasks, including ASR, natural language processing, and computer vision.
21+
22+
* **Ease of Use:**
23+
ModelScope provides a user-friendly interface and APIs for seamless model integration.
24+
25+
* **Community Support:**
26+
Benefit from a vibrant community of developers and researchers contributing to and supporting ModelScope.
27+
28+
29+
## Arm CPU Acceleration
30+
ModelScope fully supports Pytorch 1.8+ and other machine learning frameworks, which can be efficiently deployed on Arm Neoverse CPUs, taking advantage of Arm's performance and power-efficiency characteristics.
31+
32+
Arm provides optimized software and tools, such as Kleidi, to accelerate AI inference on Arm-based platforms. This makes Arm Neoverse CPUs an ideal choice for running ModelScope models in edge devices and other resource-constrained environments.
33+
34+
You can learn more about [Faster PyTorch Inference using Kleidi on Arm Neoverse](https://community.arm.com/arm-community-blogs/b/servers-and-cloud-computing-blog/posts/faster-pytorch-inference-kleidi-arm-neoverse) from Arm community website.
35+
36+
37+
## Installing ModelScope
38+
39+
First, ensure your system is up-to-date and install the required tools and libraries:
40+
41+
```bash
42+
sudo apt-get update -y
43+
sudo apt-get install -y curl git wget python3 python3-pip python3-venv python-is-python3
44+
```
45+
46+
Create and activate a virtual environment:
47+
```bash
48+
python -m venv venv
49+
source venv/bin/activate
50+
```
51+
52+
Install related packages:
53+
```bash
54+
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
55+
pip3 install numpy packaging addict datasets simplejson sortedcontainers transformers ffmpeg
56+
57+
```
58+
{{% notice Note %}}
59+
This learning path will execute models on Arm Neoverse, so we only need to install the PyTorch CPU package.
60+
{{% /notice %}}
61+
62+
## Create a sample example
63+
64+
After completing the installation, we will use an example related to Chinese semantic understanding to illustrate how to use ModelScope.
65+
66+
There is a fundamental difference between Chinese and English writing.
67+
The relationship between Chinese characters and their meanings is somewhat analogous to the difference between words and phrases in English.
68+
Some Chinese characters, like English words, have clear meanings on their own, such as “人” (person), “山” (mountain), and “水” (water).
69+
70+
However, more often, Chinese characters need to be combined with other characters to express more complete meanings, just like phrases in English.
71+
For example, “祝福” (blessing) can be broken down into “祝” (wish) and “福” (good fortune); “分享” (share) can be broken down into “分” (divide) and “享” (enjoy); “生成” (generate) is composed of “生” (produce) and “成” (become).
72+
73+
For computers to understand Chinese sentences, we need to understand the rules of Chinese characters, vocabulary, and grammar to accurately understand and express meaning.
74+
75+
Here ia a simple example using a general-domain Chinese [word segmentation model](https://www.modelscope.cn/models/iic/nlp_structbert_word-segmentation_chinese-base), which can break down Chinese sentences into individual words, facilitating analysis and understanding by computers.
76+
77+
```python
78+
from modelscope.pipelines import pipeline
79+
80+
word_segmentation = pipeline ('word-segmentation',model='damo/nlp_structbert_word-segmentation_chinese-base')
81+
text = '一段新年祝福的文字跟所有人分享'
82+
result = word_segmentation(text)
83+
84+
print(result)
85+
```
86+
87+
The output will be like this:
88+
```output
89+
2025-01-28 00:30:29,692 - modelscope - WARNING - Model revision not specified, use revision: v1.0.3
90+
Downloading Model to directory: /home/ubuntu/.cache/modelscope/hub/damo/nlp_structbert_word-segmentation_chinese-base
91+
2025-01-28 00:30:32,828 - modelscope - WARNING - Model revision not specified, use revision: v1.0.3
92+
2025-01-28 00:30:33,332 - modelscope - INFO - initiate model from /home/ubuntu/.cache/modelscope/hub/damo/nlp_structbert_word-segmentation_chinese-base
93+
2025-01-28 00:30:33,333 - modelscope - INFO - initiate model from location /home/ubuntu/.cache/modelscope/hub/damo/nlp_structbert_word-segmentation_chinese-base.
94+
2025-01-28 00:30:33,334 - modelscope - INFO - initialize model from /home/ubuntu/.cache/modelscope/hub/damo/nlp_structbert_word-segmentation_chinese-base
95+
You are using a model of type bert to instantiate a model of type structbert. This is not supported for all configurations of models and can yield errors.
96+
2025-01-28 00:30:35,522 - modelscope - WARNING - No preprocessor field found in cfg.
97+
2025-01-28 00:30:35,522 - modelscope - WARNING - No val key and type key found in preprocessor domain of configuration.json file.
98+
2025-01-28 00:30:35,522 - modelscope - WARNING - Cannot find available config to build preprocessor at mode inference, current config: {'model_dir': '/home/ubuntu/.cache/modelscope/hub/damo/nlp_structbert_word-segmentation_chinese-base'}. trying to build by task and model information.
99+
2025-01-28 00:30:35,527 - modelscope - INFO - cuda is not available, using cpu instead.
100+
2025-01-28 00:30:35,529 - modelscope - WARNING - No preprocessor field found in cfg.
101+
2025-01-28 00:30:35,529 - modelscope - WARNING - No val key and type key found in preprocessor domain of configuration.json file.
102+
2025-01-28 00:30:35,529 - modelscope - WARNING - Cannot find available config to build preprocessor at mode inference, current config: {'model_dir': '/home/ubuntu/.cache/modelscope/hub/damo/nlp_structbert_word-segmentation_chinese-base', 'sequence_length': 512}. trying to build by task and model information.
103+
/home/ubuntu/venv/lib/python3.10/site-packages/transformers/modeling_utils.py:1044: FutureWarning: The `device` argument is deprecated and will be removed in v5 of Transformers.
104+
warnings.warn(
105+
{'output': ['生成', '一', '段', '新年', '祝福', '的', '文字', '跟', '所有', '人', '分享']}
106+
```
107+
108+
The segmentation model has correctly identified the following words:
109+
110+
- 一 (one): This is a numeral.
111+
112+
- 段 (piece): This is a measure word used for text.
113+
114+
- 新年 (New Year): This is a noun phrase meaning "New Year."
115+
116+
- 祝福 (blessings): This is a noun meaning "blessings" or "good wishes."
117+
118+
- 的 (of): This is a possessive particle.
119+
120+
- 文字 (text): This is a noun meaning "text" or "written words."
121+
122+
- 跟 (with): This is a preposition meaning "with."
123+
124+
- 所有 (all): This is a quantifier meaning "all."
125+
126+
- 人 (people): This is a noun meaning "people."
127+
128+
- 分享 (share): This is a verb meaning "to share."
129+
130+
131+
The segmentation model has successfully identified the word boundaries and separated the sentence into meaningful units, which is essential for further natural language processing tasks like machine translation or sentiment analysis.

0 commit comments

Comments
 (0)