ArmDeveloperEcosystem
diff --git a/‎content/learning-paths/servers-and-cloud-computing/funASR/1_asr.md‎
Lines changed: 62 additions & 0 deletions b/‎content/learning-paths/servers-and-cloud-computing/funASR/1_asr.md‎
Lines changed: 62 additions & 0 deletions
diff --git a/‎content/learning-paths/servers-and-cloud-computing/funASR/2_modelscope.md‎
Lines changed: 131 additions & 0 deletions b/‎content/learning-paths/servers-and-cloud-computing/funASR/2_modelscope.md‎
Lines changed: 131 additions & 0 deletions
@@ -0,0 +1,62 @@
+---
+title: Introduction to Automatic Speech Recognition (ASR)
+weight: 2
+
+### FIXED, DO NOT MODIFY
+layout: learningpathall
+---
+
+## What is ASR?
+
+Automatic Speech Recognition [ASR](https://en.wikipedia.org/wiki/Speech_recognition), also known as speech-to-text, is a rapidly evolving field that empowers computers to understand and transcribe human speech.
+
+This technology has become deeply integrated into our daily lives, powering a wide range of applications and services we often take for granted, many of which are optimized for and run on Arm CPU architecture.
+
+At its core, ASR involves converting spoken audio into written text. While seemingly simple, this process is quite complex, requiring sophisticated algorithms and models to accurately interpret the nuances of human speech, including variations in pronunciation, accents, and background noise.
+
+### Key Applications of ASR
+
+ASR is now used in a myriad of applications across various domains:
+
+* **Virtual Assistants and Chatbots:**
+
+    Integrating ASR into chatbots and virtual assistants to enable natural language interactions with users. This can be used for customer support, information retrieval, or even entertainment.
+    For xample: A developer could use ASR to create a voice-controlled chatbot that helps users troubleshoot technical issues or navigate complex software applications.
+
+* **Data Analytics and Business Intelligence:**
+
+    Analyzing customer interactions (e.g., calls, surveys) to identify trends and improve services.
+    For example: An app that transcribes customer service calls and uses sentiment analysis to identify areas where customer satisfaction can be improved.
+
+* **Accessibility Tools:**
+
+    Developing tools that use ASR to improve accessibility for users with disabilities. This could include voice-controlled interfaces, real-time captioning, or text-to-speech conversion.
+    For example: A developer could create a tool that uses ASR to generate real-time captions for online meetings or presentations, making them accessible to deaf or hard-of-hearing participants.
+
+* **Software Development Tools:**
+
+    Integrating ASR into software development tools to improve efficiency and productivity. This could include voice commands for code editing, debugging, or version control.
+    For example: A developer could create a plugin for their IDE that allows them to use voice commands to write code, navigate through files, or run tests.
+
+* **Smart Homes and IoT:**
+    Voice control of smart home devices and appliances for a more convenient user experience.
+    For example: voice-activated home automation system that allows users to control lighting, temperature, and entertainment systems with natural language commands.
+
+
+### Challenges in ASR
+
+While the potential applications of ASR are vast and inspiring, it's important to acknowledge the inherent challenges in developing and deploying accurate and reliable ASR systems. These challenges stem from the complexities of human speech, environmental factors, and the intricacies of language itself. These challenges are particularly pronounced for Chinese ASR, which needs to address unique linguistic characteristics such as:
+
+* **Complexities of Chinese Language:** 
+    Mandarin Chinese involves tonal variations where the meaning of a syllable changes depending on its tone, and punctuation is crucial to convey meaning and avoid ambiguity. Accurately recognizing these nuances is essential for understanding spoken Chinese.
+
+* **Noise Robustness:** 
+    ASR systems need to be able to filter out background noise to accurately transcribe speech. This is particularly challenging in noisy environments like crowded streets or busy offices.
+
+* **Dialectal Diversity:** 
+    Chinese encompasses numerous dialects with significant variations in pronunciation and vocabulary. This poses a challenge for ASR systems to generalize across different regions and speakers.
+
+* **Homophones:**
+    Chinese has a high prevalence of homophones, words that sound alike but have different meanings. Disambiguating these homophones requires understanding the context and semantics of the spoken words.
+
+In the following sections, we'll explore one such solution that leverages the power of ModelScope and Arm CPUs to enable efficient and accurate Chinese ASR.
@@ -0,0 +1,131 @@
+---
+title: ModelScope - Open-Source AI Pre-trained AI models hub
+weight: 3
+
+### FIXED, DO NOT MODIFY
+layout: learningpathall
+---
+
+## Before you begin
+
+To follow the instructions for this Learning Path, you will need an Arm server running Ubuntu 22.04 LTS or later version with at least 8 cores, 16GB of RAM, and 30GB of disk storage.
+
+## Introduce ModelScope
+[ModelScope](https://github.com/modelscope/modelscope/) is an open-source platform that makes it easy to use AI models in your applications. 
+It provides a wide variety of pre-trained models for tasks like image recognition, natural language processing, and audio analysis. With ModelScope, you can easily integrate these models into your projects with just a few lines of code.
+
+Key benefits of ModelScope include:
+
+* **Model Diversity:** 
+    Access a wide range of models for various tasks, including ASR, natural language processing, and computer vision.
+
+* **Ease of Use:** 
+    ModelScope provides a user-friendly interface and APIs for seamless model integration.
+
+* **Community Support:** 
+    Benefit from a vibrant community of developers and researchers contributing to and supporting ModelScope.
+
+
+## Arm CPU Acceleration
+ModelScope fully supports Pytorch 1.8+ and other machine learning frameworks, which can be efficiently deployed on Arm Neoverse CPUs, taking advantage of Arm's performance and power-efficiency characteristics.
+
+Arm provides optimized software and tools, such as Kleidi, to accelerate AI inference on Arm-based platforms. This makes Arm Neoverse CPUs an ideal choice for running ModelScope models in edge devices and other resource-constrained environments.
+
+You can learn more about [Faster PyTorch Inference using Kleidi on Arm Neoverse](https://community.arm.com/arm-community-blogs/b/servers-and-cloud-computing-blog/posts/faster-pytorch-inference-kleidi-arm-neoverse) from Arm community website.
+
+
+## Installing ModelScope
+
+First, ensure your system is up-to-date and install the required tools and libraries:
+
+```bash
+sudo apt-get update -y
+sudo apt-get install -y curl git wget python3 python3-pip python3-venv python-is-python3
+```
+
+Create and activate a virtual environment:
+```bash
+python -m venv venv
+source venv/bin/activate
+```
+
+Install related packages: 
+```bash
+pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
+pip3 install numpy packaging addict datasets simplejson sortedcontainers transformers ffmpeg
+
+```
+{{% notice Note %}}
+This learning path will execute models on Arm Neoverse, so we only need to install the PyTorch CPU package.
+{{% /notice %}}
+
+## Create a sample example
+
+After completing the installation, we will use an example related to Chinese semantic understanding to illustrate how to use ModelScope.
+
+There is a fundamental difference between Chinese and English writing. 
+The relationship between Chinese characters and their meanings is somewhat analogous to the difference between words and phrases in English. 
+Some Chinese characters, like English words, have clear meanings on their own, such as “人” (person), “山” (mountain), and “水” (water).
+
+However, more often, Chinese characters need to be combined with other characters to express more complete meanings, just like phrases in English. 
+For example, “祝福” (blessing) can be broken down into “祝” (wish) and “福” (good fortune); “分享” (share) can be broken down into “分” (divide) and “享” (enjoy); “生成” (generate) is composed of “生” (produce) and “成” (become).
+
+For computers to understand Chinese sentences, we need to understand the rules of Chinese characters, vocabulary, and grammar to accurately understand and express meaning.
+
+Here ia a simple example using a general-domain Chinese [word segmentation model](https://www.modelscope.cn/models/iic/nlp_structbert_word-segmentation_chinese-base), which can break down Chinese sentences into individual words, facilitating analysis and understanding by computers.
+
+```python
+from modelscope.pipelines import pipeline
+
+word_segmentation = pipeline ('word-segmentation',model='damo/nlp_structbert_word-segmentation_chinese-base')
+text = '一段新年祝福的文字跟所有人分享'
+result = word_segmentation(text)
+
+print(result)
+```
+
+The output will be like this:
+```output
+2025-01-28 00:30:29,692 - modelscope - WARNING - Model revision not specified, use revision: v1.0.3
+Downloading Model to directory: /home/ubuntu/.cache/modelscope/hub/damo/nlp_structbert_word-segmentation_chinese-base
+2025-01-28 00:30:32,828 - modelscope - WARNING - Model revision not specified, use revision: v1.0.3
+2025-01-28 00:30:33,332 - modelscope - INFO - initiate model from /home/ubuntu/.cache/modelscope/hub/damo/nlp_structbert_word-segmentation_chinese-base
+2025-01-28 00:30:33,333 - modelscope - INFO - initiate model from location /home/ubuntu/.cache/modelscope/hub/damo/nlp_structbert_word-segmentation_chinese-base.
+2025-01-28 00:30:33,334 - modelscope - INFO - initialize model from /home/ubuntu/.cache/modelscope/hub/damo/nlp_structbert_word-segmentation_chinese-base
+You are using a model of type bert to instantiate a model of type structbert. This is not supported for all configurations of models and can yield errors.
+2025-01-28 00:30:35,522 - modelscope - WARNING - No preprocessor field found in cfg.
+2025-01-28 00:30:35,522 - modelscope - WARNING - No val key and type key found in preprocessor domain of configuration.json file.
+2025-01-28 00:30:35,522 - modelscope - WARNING - Cannot find available config to build preprocessor at mode inference, current config: {'model_dir': '/home/ubuntu/.cache/modelscope/hub/damo/nlp_structbert_word-segmentation_chinese-base'}. trying to build by task and model information.
+2025-01-28 00:30:35,527 - modelscope - INFO - cuda is not available, using cpu instead.
+2025-01-28 00:30:35,529 - modelscope - WARNING - No preprocessor field found in cfg.
+2025-01-28 00:30:35,529 - modelscope - WARNING - No val key and type key found in preprocessor domain of configuration.json file.
+2025-01-28 00:30:35,529 - modelscope - WARNING - Cannot find available config to build preprocessor at mode inference, current config: {'model_dir': '/home/ubuntu/.cache/modelscope/hub/damo/nlp_structbert_word-segmentation_chinese-base', 'sequence_length': 512}. trying to build by task and model information.
+/home/ubuntu/venv/lib/python3.10/site-packages/transformers/modeling_utils.py:1044: FutureWarning: The `device` argument is deprecated and will be removed in v5 of Transformers.
+  warnings.warn(
+{'output': ['生成', '一', '段', '新年', '祝福', '的', '文字', '跟', '所有', '人', '分享']}
+```
+
+The segmentation model has correctly identified the following words:
+
+- 一 (one): This is a numeral.
+
+- 段 (piece): This is a measure word used for text.
+
+- 新年 (New Year): This is a noun phrase meaning "New Year."
+
+- 祝福 (blessings): This is a noun meaning "blessings" or "good wishes."
+
+- 的 (of): This is a possessive particle.
+
+- 文字 (text): This is a noun meaning "text" or "written words."
+
+- 跟 (with): This is a preposition meaning "with."
+
+- 所有 (all): This is a quantifier meaning "all."
+
+- 人 (people): This is a noun meaning "people."
+
+- 分享 (share): This is a verb meaning "to share."
+
+
+The segmentation model has successfully identified the word boundaries and separated the sentence into meaningful units, which is essential for further natural language processing tasks like machine translation or sentiment analysis.