Merge pull request #1674 from madeline-underwood/FunASR

pareenaverma · web-flow · commit 71eea3994a7d · 2025-03-05T17:05:35.000-08:00
Fun ASR_Andy to review
diff --git a/content/learning-paths/servers-and-cloud-computing/funASR/1_asr.md b/content/learning-paths/servers-and-cloud-computing/funASR/1_asr.md
@@ -1,5 +1,5 @@
 ---
-title: Introduction to Automatic Speech Recognition (ASR)
+title: Introduction to Automatic Speech Recognition 
 weight: 2
 
 ### FIXED, DO NOT MODIFY
@@ -19,19 +19,24 @@ At its core, ASR transforms spoken language into written text. Despite seeming s
 ASR is used in myriad applications across various domains:
 
 * Virtual Assistants and Chatbots - ASR enables natural language interactions in chatbots and virtual assistants, enhancing customer support, information retrieval, and even entertainment. 
-Use case: a developer can use ASR to create a voice-controlled chatbot that helps users troubleshoot technical issues or navigate complex software applications.
 
-* Data Analytics and Business Intelligence - Leveraging ASR to analyze customer interactions, such as calls and surveys, to uncover trends and enhance services. 
-Use case: an app that transcribes customer service calls and applies sentiment analysis to pinpoint areas for improving customer satisfaction.
+{{% notice Use case %}}A developer can use ASR to create a voice-controlled chatbot that helps users troubleshoot technical issues or navigate complex software applications.{{% /notice %}}
 
-* ASR-powered Accessibility Tools - Developing ASR-powered tools to enhance accessibility for users with disabilities.  Applications include voice-controlled interfaces, real-time captioning, and text-to-speech conversion. 
-Use case: a developer can create a tool that uses ASR to generate real-time captions for online meetings or presentations, making them accessible to deaf and hard-of-hearing individuals.
+* Data Analytics and Business Intelligence - leveraging ASR to analyze customer interactions, such as calls and surveys, to uncover trends and enhance services.
 
-* Integrating ASR into Software Development Tools - Enhancing efficiency and productivity by incorporating ASR into development environments. This can include voice commands for code editing, debugging, or version control.
-Use case: a developer can build an IDE plugin that enables voice-controlled coding, file navigation, and test execution, streamlining the workflow and reducing reliance on manual input.
+{{% notice Use case %}}An app that transcribes customer service calls and applies sentiment analysis to pinpoint areas for improving customer satisfaction.{{% /notice %}}
 
-* Smart Homes and IoT with ASR - Enhancing convenience with voice control of smart home devices and appliances.
-Use case: a voice-activated home automation system that lets users control lighting, temperature, and entertainment systems with natural language commands.
+* ASR-powered Accessibility Tools - developing ASR-powered tools to enhance accessibility for users with disabilities.  Applications include voice-controlled interfaces, real-time captioning, and text-to-speech conversion.
+
+{{% notice Use case %}}A developer can create a tool that uses ASR to generate real-time captions for online meetings or presentations, making them accessible to deaf and hard-of-hearing individuals.{{% /notice %}}
+
+* Integrating ASR into Software Development Tools - enhancing efficiency and productivity by incorporating ASR into development environments. This can include voice commands for code editing, debugging, or version control.
+
+{{% notice Use case %}}A developer can build an IDE plugin that enables voice-controlled coding, file navigation, and test execution, streamlining the workflow and reducing reliance on manual input.{{% /notice %}}
+
+* Smart Homes and IoT with ASR - enhancing convenience with voice control of smart home devices and appliances.
+
+{{% notice Use case %}}A voice-activated home automation system that lets users control lighting, temperature, and entertainment systems with natural language commands.{{% /notice %}}
 
 ### Challenges in ASR
 
diff --git a/content/learning-paths/servers-and-cloud-computing/funASR/2_modelscope.md b/content/learning-paths/servers-and-cloud-computing/funASR/2_modelscope.md
@@ -1,5 +1,5 @@
 ---
-title: ModelScope - Open-Source Pre-trained AI models hub
+title: ModelScope - an Open Source Pre-trained AI Models Hub
 weight: 3
 
 ### FIXED, DO NOT MODIFY
@@ -8,18 +8,18 @@ layout: learningpathall
 
 ## Before you begin
 
-To follow the instructions for this Learning Path, you will need an Arm-based server running Ubuntu 22.04 LTS or later, with at least 8 cores, 16GB of RAM, and 30GB of disk storage.
+Before you begin following the instructions for this Learning Path, make sure you have an Arm-based server running Ubuntu 22.04 LTS or later, with at least 8 cores, 16GB of RAM, and 30GB of disk storage.
 
 ## What is ModelScope?
 [ModelScope](https://github.com/modelscope/modelscope/) is an open-source platform designed to simplify the integration of AI models into applications. It offers a wide variety of pre-trained models for tasks such as image recognition, natural language processing, and audio analysis. With ModelScope, you can seamlessly integrate these models into your projects using just a few lines of code.
 
 Key benefits of ModelScope:
 
-* Model Diversity - Access a wide range of models for various tasks, including Automatic Speech Recognition (ASR), natural language processing (NLP), and computer vision.
+* Model Diversity - access a wide range of models for various tasks, including Automatic Speech Recognition (ASR), natural language processing (NLP), and computer vision.
 
 * Ease of Use - ModelScope provides a user-friendly interface and APIs that enable seamless model integration.
 
-* Community Support - Benefit from a vibrant community of developers and researchers who actively contribute to and support ModelScope.
+* Community Support - benefit from a vibrant community of developers and researchers who actively contribute to and support ModelScope.
 
 
 ## Arm CPU Acceleration
@@ -94,7 +94,7 @@ result = word_segmentation(text)
 print(result)
 ```
 
-This piece of code specifies a model and provides a Chinese sentence for the model to segment.
+This piece of code specifies a model and provides a Chinese sentence for the model to segment;
 "A New Year’s greeting message to share with everyone."
 
 Run the model inference on the sample text:
diff --git a/content/learning-paths/servers-and-cloud-computing/funASR/3_funasr.md b/content/learning-paths/servers-and-cloud-computing/funASR/3_funasr.md
@@ -147,7 +147,7 @@ Result:
 
 The output shows "欢迎大家来到达摩社区进行体验" which means "Welcome everyone to the Dharma community to explore and experience!" as expected.
 
-You can also observe that the spacing between the third and sixth characters is very short. This is because they are combined with other characters, as discussed in the previous section.
+You can also see that the spacing between the third and sixth characters is short. This is because they are combined with other characters, as discussed in the previous section.
 
 You can now build a speech processing pipeline. The output of the speech recognition module serves as the input for the semantic segmentation model, enabling you to validate the accuracy of the recognized results. Copy the code shown below in a file named `funasr_test3.py`:
 
@@ -223,9 +223,7 @@ Good, the result is exactly what you are looking for.
 
 ## Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition
 
-Now you can look at a more advanced speech recognition model, [Paraformer](https://aclanthology.org/2020.wnut-1.18/).
-
-Paraformer is a novel architecture for automatic speech recognition (ASR) designed for both speed and accuracy. Unlike traditional models, it leverages a parallel transformer architecture, enabling simultaneous processing of multiple parts of the input speech. This parallel processing capability leads to significantly faster inference, making Paraformer well-suited for real-time ASR applications where responsiveness is crucial.  
+[Paraformer](https://aclanthology.org/2020.wnut-1.18/) is a novel architecture for automatic speech recognition (ASR) designed for both speed and accuracy. Unlike traditional models, it leverages a parallel transformer architecture, enabling simultaneous processing of multiple parts of the input speech. This parallel processing capability leads to significantly faster inference, making Paraformer well-suited for real-time ASR applications where responsiveness is crucial.  
 
 Furthermore, Paraformer has demonstrated state-of-the-art accuracy on several benchmark datasets, showcasing its effectiveness in accurately transcribing speech. This combination of speed and accuracy makes Paraformer a promising advancement in the field of ASR, opening up new possibilities for high-performance speech recognition systems.
 
@@ -504,10 +502,10 @@ python --version
 ```
 
 {{% notice Note %}}
-The update-alternatives command is a Debian-based Linux utility (used in Ubuntu, Debian, etc.) for managing symbolic links to different versions of software alternatives. It allows you to easily switch between multiple installed versions of the same program.
+The `update-alternatives` command is a utility in Debian-based Linux distributions, such as Ubuntu and Debian, that manages symbolic links for different software versions. It simplifies the process of switching between multiple installed versions of the same program.
 {{% /notice %}}
 
-The Python version should be 3.10 now.
+The Python version should now be 3.10.
 
 ```output
 Python 3.10.16
@@ -541,19 +539,19 @@ pip uninstall torchao && cd ao/ && rm -rf build && python setup.py install
 ```
 
 {{% notice Note %}}
-Please reference this [link](https://learn.arm.com/learning-paths/servers-and-cloud-computing/pytorch-llama/pytorch-llama/) to learn more the detail description of those instructions.
+See this Learning Path [Run a Large Language Model Chatbot on Arm servers](https://learn.arm.com/learning-paths/servers-and-cloud-computing/pytorch-llama/pytorch-llama/) for further information.
 {{% /notice %}}
 
-Once you have installed the optimized PyTorch, we can enabled bfloat16 fast math kernels by setting DNNL_DEFAULT_FPMATH_MODE.
+Once you have installed the optimized PyTorch, enable bfloat16 fast math kernels by setting DNNL_DEFAULT_FPMATH_MODE.
 
-On AWS Graviton3 as example, this enables GEMM kernels that use bfloat16 MMLA instructions available in the hardware.
+Using AWS Graviton3 as an example, this enables GEMM kernels that use bfloat16 MMLA instructions available in the hardware.
 
 
 ### Update the FunASR application with benchmark function
 
-Now we can test FunASR model again.
+Now you can test the FunASR model again.
 
-By re-use the previously paraformer-2.py and add benchmark function, please copy the updated code shown below in a file named `paraformer-3.py`:
+By reusing the previously-named `paraformer-2.py` file and add benchmark function, copy the updated code shown below in a file named `paraformer-3.py`:
 
 ```python
 import os
@@ -650,7 +648,7 @@ The model took 0.816 seconds to complete execution.
 
 ### Enable bfloat16 Fast Math Kernels
 
-Now we enable bfloat16 and run Python script again:
+Now enable bfloat16 and run Python script again:
 
 ```bash
 export DNNL_DEFAULT_FPMATH_MODE=BF16
@@ -665,7 +663,7 @@ rtf_avg: 0.010: 100%|
  Execution Time: 738.04 ms (Mean), 747.57 ms (P99)
 ```
 
-You can notice that the execution time is now 0.7 seconds, reflecting an improvement compared to earlier results.
+Here you can see that the execution time is now 0.7 seconds, reflecting an improvement compared to earlier results.
 
 ## Conclusion
-Arm CPUs and an optimized software ecosystem enable developers to build innovative, efficient ASR solutions. Explore the capabilities of ModelScope and FunASR, and unlock the potential of Arm technology for your next Chinese ASR project.
+Arm CPUs paired with an optimized software ecosystem enable developers to build innovative, efficient ASR solutions. Discover the potential of ModelScope and FunASR, and harness Arm technology for your next Chinese ASR project.
diff --git a/content/learning-paths/servers-and-cloud-computing/funASR/_index.md b/content/learning-paths/servers-and-cloud-computing/funASR/_index.md
@@ -1,20 +1,20 @@
 ---
-title: Deploy ModelScope FunASR Chinese Speech Recognition Model on Arm Servers
+title: Deploy ModelScope FunASR Model on Arm Servers
 draft: true
 cascade:
     draft: true
 
 minutes_to_complete: 60
 
-who_is_this_for: This is an introductory topic for software developers and AI engineers interested in learning how to run Chinese Automatic Speech Recognition (ASR) applications on Arm servers.
+who_is_this_for: This is an introductory topic for developers interested in learning how to deploy the ModelScope FunASR Chinese Automatic Speech Recognition (ASR) model on Arm-based servers.
 
 learning_objectives:
     - Leverage open-source large language models and tools to build Chinese ASR applications.
-    - Deploy real-time Chinese speech recognition, punctuation restoration, and sentiment analysis with FunASR.
-    - Describe how to accelerate ModelScope models on Arm-based servers for performance and efficiency.
+    - Deploy real-time Chinese speech recognition, punctuation restoration, and sentiment analysis using FunASR.
+    - Describe how to accelerate ModelScope models on Arm-based servers for enhanced performance and efficiency.
 
 prerequisites:
-    - An [Arm-based instance](/learning-paths/servers-and-cloud-computing/csp/) from a cloud service provider, or a local Arm Linux computer with at least 8 CPUs and 16GB RAM.
+    - An [Arm-based instance](/learning-paths/servers-and-cloud-computing/csp/) from a cloud service provider, or a local Arm Linux computer with at least 8 CPUs and 16GB of RAM.
 
 author: Odin Shen