Final checks

madeline-underwood · madeline-underwood · commit f12e9037931e · 2025-03-05T11:14:42.000Z
diff --git a/content/learning-paths/servers-and-cloud-computing/whisper/_index.md b/content/learning-paths/servers-and-cloud-computing/whisper/_index.md
@@ -3,18 +3,19 @@ title: Accelerate Whisper on Arm with Hugging Face Transformers
 
 minutes_to_complete: 30
 
-who_is_this_for: This Learning Path is for software developers looking to run the Whisper Automatic Speech Recognition (ASR) model efficiently. You will use an Arm-based cloud instance to run and build speech transcription-based applications.
+who_is_this_for: This Learning Path is for software developers familiar with basic machine learning concepts and looking to run the OpenAI Whisper Automatic Speech Recognition (ASR) model efficiently, using an Arm-based cloud instance.
 
 learning_objectives:
     - Install the dependencies for the Whisper ASR Model.
-    - Run the OpenAI Whisper model using Hugging Face Transformers.
+    - Run the Whisper model using Hugging Face Transformers.
     - Enable performance-enhancing features for running the model on Arm CPUs.
     - Evaluate transcript generation times using Whisper.
 
 
 prerequisites:
-    - An [Arm-based compute instance](/learning-paths/servers-and-cloud-computing/intro/) running Ubuntu with 32 cores, 8GB of RAM, and 32GB disk space.
-    - Basic knowledge of Python and machine learning concepts.
+    - An [Arm-based compute instance](/learning-paths/servers-and-cloud-computing/intro/) running Ubuntu with 32 cores, 8GB of RAM, and 32GB of disk space.
+    - Basic knowledge of Python.
+    - Familiarity with machine learning concepts.
     - Familiarity with the fundamentals of the Whisper ASR Model.
 
 author: Nobel Chowdary Mandepudi
diff --git a/content/learning-paths/servers-and-cloud-computing/whisper/whisper.md b/content/learning-paths/servers-and-cloud-computing/whisper/whisper.md
@@ -12,19 +12,27 @@ layout: "learningpathall"
 
 This Learning Path demonstrates how to run the [whisper-large-v3-turbo model](https://huggingface.co/openai/whisper-large-v3-turbo) as an application that accepts an audio input and computes its text transcript. 
 
-The instructions in this Learning Path have been designed for Arm servers running Ubuntu 24.04 LTS. You will need an Arm server instance with 32 cores, at least 8GB of RAM, and 32GB of disk space. These steps have been tested on an AWS Graviton4 `c8g.8xlarge` instance.
+The instructions in this Learning Path have been designed for Arm servers running Ubuntu 24.04 LTS. You will need an Arm server instance with 32 cores, at least 8GB of RAM, and 32GB of disk space. 
 
-## Overview
+These steps have been tested on an AWS Graviton4 `c8g.8xlarge` instance.
+
+## Overview and Focus of Learning Path
 
 OpenAI Whisper is an open-source Automatic Speech Recognition (ASR) model trained on multilingual, multitask data. It can generate transcripts in multiple languages and translate various languages into English. 
 
-In this Learning Path, you will learn about the foundational aspects of speech-to-text transcription applications, with a focus on running OpenAI’s Whisper on an Arm CPU. Finally, you will explore the implementation and performance considerations required to efficiently deploy Whisper using the Hugging Face Transformers framework.
+In this Learning Path, you will learn about the foundational aspects of speech-to-text transcription applications, with a focus on running OpenAI’s Whisper on an Arm CPU. You will explore the implementation and performance considerations required to efficiently deploy Whisper using the Hugging Face Transformers framework.
+
+### Speech-to-text ML applications
+
+Speech-to-text (STT) transcription applications transform spoken language into written text, enabling voice-driven interfaces, accessibility tools, and real-time communication services. 
+
+Audio is first cleaned and converted into a format suitable for processing, then passed through a deep learning model trained to recognize speech patterns. Advanced language models help refine the output, improving accuracy by predicting likely word sequences based on context. When deployed on cloud servers, STT applications must balance accuracy, latency, and computational efficiency to meet diverse use cases.
 
-### Speech-to-Text ML applications
+## Learning Path Setup
 
-Speech-to-text (STT) transcription applications transform spoken language into written text, enabling voice-driven interfaces, accessibility tools, and real-time communication services. Audio is first cleaned and converted into a format suitable for processing, then passed through a deep learning model trained to recognize speech patterns. Advanced language models help refine the output, improving accuracy by predicting likely word sequences based on context. When deployed on cloud servers, STT applications must balance accuracy, latency, and computational efficiency to meet diverse use cases.
+To get set up, follow these steps, copying the code snippets at each stage.
 
-## Install dependencies
+### Install dependencies
 
 Install the following packages on your Arm-based server instance:
 
@@ -33,7 +41,7 @@ sudo apt update
 sudo apt install python3-pip python3-venv ffmpeg wget -y
 ```
 
-## Install Python Dependencies
+### Install Python Dependencies
 
 Create a Python virtual environment:
 
@@ -53,7 +61,7 @@ Install the required libraries using pip:
 pip install torch transformers accelerate
 ```
 
-## Download the Sample Audio File
+### Download the Sample Audio File
 
 Download this sample audio file, which is about 33 seconds in .wav format. 
 
@@ -62,7 +70,7 @@ You can use any .wav file to try different examples:
 wget https://www.voiptroubleshooter.com/open_speech/american/OSR_us_000_0010_8k.wav
 ```
 
-## Create a Python Script for Audio-To-Text Transcription
+### Create a Python Script for Audio-To-Text Transcription
 
 Use the Hugging Face `Transformers` framework to process the audio. It provides classes to configure the model and prepare it for inference. 
 
@@ -133,11 +141,11 @@ export DNNL_VERBOSE=1
 python3 whisper-application.py
 ```
 
-You should see output similar to the image below, which includes the log output, the audio transcript, and the `Inference elapsed time`.
+You should see output similar to the image below, which includes the log output, the audio transcript, and the `Inferencing elapsed time`.
 
 ![frontend](whisper_output_no_flags.png)
 
 
-You've now run the Whisper model successfully on your Arm-based CPU. 
+You have now run the Whisper model successfully on your Arm-based CPU. 
 
 Continue to the next section to configure flags that can boost your model's performance.
diff --git a/content/learning-paths/servers-and-cloud-computing/whisper/whisper_deploy.md b/content/learning-paths/servers-and-cloud-computing/whisper/whisper_deploy.md
@@ -5,31 +5,38 @@ weight: 4
 layout: learningpathall
 ---
 
-## Setting Environment Variables that Impact Performance
+## Optimize Environment Variables to Boost Performance
 
-Speech-to-text applications often process large amounts of audio data in real time, requiring efficient computation to balance accuracy and speed. Low-level implementations of neural network kernels can enhance performance by reducing processing overhead. When tailored for specific hardware architectures, such as Arm CPUs, these kernels accelerate key tasks like feature extraction and neural network inference. Optimized kernels ensure that speech models like OpenAI’s Whisper run efficiently, making high-quality transcription more accessible across various server applications.
+Speech-to-text applications often process large amounts of audio data in real time, requiring efficient computation to balance accuracy and speed. Low-level implementations of neural network kernels can enhance performance by reducing processing overhead. 
 
-Other considerations allow for more efficient memory usage. For example, allocating additional memory and threads for specific tasks can increase performance. By enabling these hardware-aware options, applications achieve lower latency, reduced power consumption, and smoother real-time transcription.
+When tailored for specific hardware architectures, such as Arm CPUs, these kernels accelerate key tasks such as feature extraction and neural network inference. Optimized kernels ensure that speech models like OpenAI’s Whisper run efficiently, making high-quality transcription more accessible across various server applications.
 
-Use the following flags to optimize performance on Arm machines:
+Other factors contribute to more efficient memory usage. For example, allocating additional memory and threads for specific tasks can boost performance. By leveraging these hardware-aware optimizations, applications can achieve lower latency, reduced power consumption, and smoother real-time transcription.
 
-* Enable fast math BFloat16(BF16) GEMM kernels.
-* Enable Linux Transparent Huge Page (THP) allocations.
-* Enable logs to confirm kernel and set LRU cache capacity and OMP_NUM_THREADS.
+Use the following flags to optimize performance on Arm machines:
 
 ```bash
 export DNNL_DEFAULT_FPMATH_MODE=BF16
 export THP_MEM_ALLOC_ENABLE=1
 export LRU_CACHE_CAPACITY=1024
 export OMP_NUM_THREADS=32
 ```
+These variables do the following:
+
+*`export DNNL_DEFAULT_FPMATH_MODE=BF16` - sets the default floating-point math mode for the oneDNN library to BF16 (bfloat16). This can improve performance and efficiency on hardware that supports BF16 precision.
+
+*`export THP_MEM_ALLOC_ENABLE=1` - enables an optimized memory allocation strategy - often leveraging transparent huge pages - which can enhance memory management and reduce fragmentation in frameworks like PyTorch.
+ 
+*`export LRU_CACHE_CAPACITY=1024` - configures the capacity of a Least Recently Used (LRU) cache to 1024 entries. This helps store and quickly retrieve recently used data, reducing redundant computations. 
+
+*`export OMP_NUM_THREADS=32` - sets the number of threads for OpenMP-based parallel processing to 32, allowing your application to take full advantage of multi-core systems for faster performance.
 
 {{% notice Note %}}
 BF16 support is merged into PyTorch versions greater than 2.3.0.
 {{% /notice %}}
 
 ## Run Whisper File
-After setting the environment variables in the previous step, you can now run the Whisper model again and analyze the performance impact.
+After setting the environment variables in the previous step, run the Whisper model again and analyze the performance impact.
 
 Run the `whisper-application.py` file:
 
@@ -43,6 +50,6 @@ You should now see that the processing time has gone down compared to the last r
 
 ![frontend](whisper_output.png)
 
-The output in the above image has the log containing `attr-fpmath:bf16`, which confirms that fast math BF16 kernels are used in the compute process to improve the performance.
+The output in the above image has the log containing `attr-fpmath:bf16`, which confirms that the compute process uses fast math BF16 kernels to improve performance.
 
-Enable the environment variables detailed in this Learning Path to achieve performance uplift of OpenAI Whisper using Hugging Face Transformers framework on Arm.
+You have now learned how configuring these environment variables can achieve performance uplift of OpenAI's Whisper model when using Hugging Face Transformers framework on Arm-based systems.