Skip to content

Commit 5b02ebe

Browse files
authored
Merge pull request #1671 from madeline-underwood/OpenAI_Whisper
Open AI Whisper_Andy to check
2 parents 300e016 + b76a0d0 commit 5b02ebe

File tree

4 files changed

+64
-37
lines changed

4 files changed

+64
-37
lines changed

content/learning-paths/servers-and-cloud-computing/whisper/_demo.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: Demo - Audio transcription on Arm
2+
title: Demo - Whisper Voice Audio transcription on Arm
33
overview: |
44
This Learning Path shows you how to use a c8g.8xlarge AWS Graviton4 instance, powered by an Arm Neoverse CPU, to build a simple Transcription-as-a-Service server.
55
@@ -16,6 +16,7 @@ demo_steps:
1616
- Receive the transcribed output and view stats.
1717

1818

19+
1920
title_chatbot_area: Whisper Voice Demo
2021

2122
diagram: audio-pic-clearer.png

content/learning-paths/servers-and-cloud-computing/whisper/_index.md

Lines changed: 9 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,25 +1,22 @@
11
---
2-
title: Run OpenAI Whisper Audio Model efficiently on Arm with Hugging Face Transformers
3-
4-
draft: true
5-
cascade:
6-
draft: true
2+
title: Accelerate Whisper on Arm with Hugging Face Transformers
73

84
minutes_to_complete: 15
95

10-
who_is_this_for: This Learning Path is for software developers looking to run the Whisper automatic speech recognition (ASR) model efficiently. You will use an Arm-based cloud instance to run and build speech transcription based applications.
6+
who_is_this_for: This Learning Path is for software developers familiar with basic machine learning concepts and looking to run the OpenAI Whisper Automatic Speech Recognition (ASR) model efficiently, using an Arm-based cloud instance.
117

128
learning_objectives:
13-
- Install the dependencies to run the Whisper Model
14-
- Run the OpenAI Whisper model using Hugging Face Transformers.
9+
- Install the dependencies for the Whisper ASR Model.
10+
- Run the Whisper model using Hugging Face Transformers.
1511
- Enable performance-enhancing features for running the model on Arm CPUs.
16-
- Compare the total time taken to generate transcript with Whisper.
12+
- Evaluate transcript generation times using Whisper.
1713

1814

1915
prerequisites:
20-
- An [Arm-based compute instance](/learning-paths/servers-and-cloud-computing/intro/) with 32 cores, 8GB of RAM, and 32GB disk space running Ubuntu.
21-
- Basic understanding of Python and ML concepts.
22-
- Understanding of Whisper ASR Model fundamentals.
16+
- An [Arm-based compute instance](/learning-paths/servers-and-cloud-computing/intro/) running Ubuntu with 32 cores, 8GB of RAM, and 32GB of disk space.
17+
- Basic knowledge of Python.
18+
- Familiarity with machine learning concepts.
19+
- Familiarity with the fundamentals of the Whisper ASR Model.
2320

2421
author: Nobel Chowdary Mandepudi
2522

content/learning-paths/servers-and-cloud-computing/whisper/whisper.md

Lines changed: 34 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,35 +1,47 @@
11
---
22
# User change
3-
title: "Setup the Whisper Model"
3+
title: "Set up the Whisper Model"
44

5-
weight: 2
5+
weight: 3
66

77
# Do not modify these elements
88
layout: "learningpathall"
99
---
1010

1111
## Before you begin
1212

13-
This Learning Path demonstrates how to run the [whisper-large-v3-turbo model](https://huggingface.co/openai/whisper-large-v3-turbo) as an application that takes an audio input and computes the text transcript of it. The instructions in this Learning Path have been designed for Arm servers running Ubuntu 24.04 LTS. You need an Arm server instance with 32 cores, atleast 8GB of RAM and 32GB disk to run this example. The instructions have been tested on a AWS Graviton4 `c8g.8xlarge` instance.
13+
This Learning Path demonstrates how to run the [whisper-large-v3-turbo model](https://huggingface.co/openai/whisper-large-v3-turbo) as an application that accepts an audio input and computes its text transcript.
1414

15-
## Overview
15+
The instructions in this Learning Path have been designed for Arm servers running Ubuntu 24.04 LTS. You will need an Arm server instance with 32 cores, at least 8GB of RAM, and 32GB of disk space.
1616

17-
OpenAI Whisper is an open-source Automatic Speech Recognition (ASR) model trained on the multilingual and multitask data, which enables the transcript generation in multiple languages and translations from different languages to English. You will learn about the foundational aspects of speech-to-text transcription applications, specifically focusing on running OpenAI’s Whisper on an Arm CPU. Lastly, you will explore the implementation and performance considerations required to efficiently deploy Whisper using Hugging Face Transformers framework.
17+
These steps have been tested on an AWS Graviton4 `c8g.8xlarge` instance.
18+
19+
## Overview and Focus of Learning Path
20+
21+
OpenAI Whisper is an open-source Automatic Speech Recognition (ASR) model trained on multilingual, multitask data. It can generate transcripts in multiple languages and translate various languages into English.
22+
23+
In this Learning Path, you will learn about the foundational aspects of speech-to-text transcription applications, with a focus on running OpenAI’s Whisper on an Arm CPU. You will explore the implementation and performance considerations required to efficiently deploy Whisper using the Hugging Face Transformers framework.
1824

1925
### Speech-to-text ML applications
2026

21-
Speech-to-text (STT) transcription applications transform spoken language into written text, enabling voice-driven interfaces, accessibility tools, and real-time communication services. Audio is first cleaned and converted into a format suitable for processing, then passed through a deep learning model trained to recognize speech patterns. Advanced language models help refine the output, improving accuracy by predicting likely word sequences based on context. Whether running on cloud servers, STT applications must balance accuracy, latency, and computational efficiency to meet the needs of diverse use cases.
27+
Speech-to-text (STT) transcription applications transform spoken language into written text, enabling voice-driven interfaces, accessibility tools, and real-time communication services.
28+
29+
Audio is first cleaned and converted into a format suitable for processing, then passed through a deep learning model trained to recognize speech patterns. Advanced language models help refine the output, improving accuracy by predicting likely word sequences based on context. When deployed on cloud servers, STT applications must balance accuracy, latency, and computational efficiency to meet diverse use cases.
30+
31+
## Learning Path Setup
2232

23-
## Install dependencies
33+
To get set up, follow these steps, copying the code snippets at each stage.
2434

25-
Install the following packages on your Arm based server instance:
35+
### Install dependencies
36+
37+
Install the following packages on your Arm-based server instance:
2638

2739
```bash
2840
sudo apt update
2941
sudo apt install python3-pip python3-venv ffmpeg wget -y
3042
```
3143

32-
## Install Python Dependencies
44+
### Install Python Dependencies
3345

3446
Create a Python virtual environment:
3547

@@ -49,18 +61,22 @@ Install the required libraries using pip:
4961
pip install torch transformers accelerate
5062
```
5163

52-
## Download the sample audio file
64+
### Download the Sample Audio File
65+
66+
Download this sample audio file, which is about 33 seconds in .wav format.
5367

54-
Download a sample audio file, which is about 33 second audio in .wav format. You can use any .wav sound file if you'd like to try some other examples.
68+
You can use any .wav file to try different examples:
5569
```bash
5670
wget https://www.voiptroubleshooter.com/open_speech/american/OSR_us_000_0010_8k.wav
5771
```
5872

59-
## Create a python script for audio to text transcription
73+
### Create a Python Script for Audio-To-Text Transcription
6074

61-
You will use the Hugging Face `transformers` framework to help process the audio. It contains classes that configures the model, and prepares it for inference. `pipeline` is an end-to-end function for NLP tasks. In the code below, it's configured to do pre- and post-processing of the sample in this example, as well as running the actual inference.
75+
Use the Hugging Face `Transformers` framework to process the audio. It provides classes to configure the model and prepare it for inference.
6276

63-
Using a file editor of your choice, create a python file named `whisper-application.py` with the content shown below:
77+
The `pipeline` function is an end-to-end solution for NLP tasks. In the code below, it is configured to do pre- and post-processing of the sample in this example, as well as running inference.
78+
79+
Using a file editor of your choice, create a Python file named `whisper-application.py` with the following content:
6480

6581
```python { file_name="whisper-application.py" }
6682
import torch
@@ -122,9 +138,11 @@ export DNNL_VERBOSE=1
122138
python3 whisper-application.py
123139
```
124140

125-
You should see output similar to the image below with a log output, transcript of the audio and the `Inference elapsed time`.
141+
You should see output similar to the image below, which includes the log output, the audio transcript, and the `Inferencing elapsed time`.
126142

127143
![frontend](whisper_output_no_flags.png)
128144

129145

130-
You've now run the Whisper model successfully on your Arm-based CPU. Continue to the next section to configure flags that can increase the performance your running model.
146+
You have now run the Whisper model successfully on your Arm-based CPU.
147+
148+
Continue to the next section to configure flags that can boost your model's performance.

content/learning-paths/servers-and-cloud-computing/whisper/whisper_deploy.md

Lines changed: 19 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -5,27 +5,38 @@ weight: 4
55
layout: learningpathall
66
---
77

8-
## Setting environment variables that impact performance
8+
## Optimize Environment Variables to Boost Performance
99

10-
Speech-to-text applications often process large amounts of audio data in real time, requiring efficient computation to balance accuracy and speed. Low-level implementations of the kernels in the neural network enhance performance by reducing processing overhead. When tailored for specific hardware architectures, such as Arm CPUs, these kernels accelerate key tasks like feature extraction and neural network inference. Optimized kernels ensure that speech models like OpenAI’s Whisper can run efficiently, making high-quality transcription more accessible across various server applications.
10+
Speech-to-text applications often process large amounts of audio data in real time, requiring efficient computation to balance accuracy and speed. Low-level implementations of neural network kernels can enhance performance by reducing processing overhead.
1111

12-
Other considerations below allow us to use the memory more efficiently. Things like allocating additional memory and threads for a certain task can increase performance. By enabling these hardware-aware options, applications achieve lower latency, reduced power consumption, and smoother real-time transcription.
12+
When tailored for specific hardware architectures, such as Arm CPUs, these kernels accelerate key tasks such as feature extraction and neural network inference. Optimized kernels ensure that speech models like OpenAI’s Whisper run efficiently, making high-quality transcription more accessible across various server applications.
1313

14-
Use the following flags to enable fast math BFloat16(BF16) GEMM kernels, Linux Transparent Huge Page (THP) allocations, logs to confirm kernel and set LRU cache capacity and OMP_NUM_THREADS to run the Whisper efficiently on Arm machines.
14+
Other factors contribute to more efficient memory usage. For example, allocating additional memory and threads for specific tasks can boost performance. By leveraging these hardware-aware optimizations, applications can achieve lower latency, reduced power consumption, and smoother real-time transcription.
15+
16+
Use the following flags to optimize performance on Arm machines:
1517

1618
```bash
1719
export DNNL_DEFAULT_FPMATH_MODE=BF16
1820
export THP_MEM_ALLOC_ENABLE=1
1921
export LRU_CACHE_CAPACITY=1024
2022
export OMP_NUM_THREADS=32
2123
```
24+
These variables do the following:
25+
26+
*`export DNNL_DEFAULT_FPMATH_MODE=BF16` - sets the default floating-point math mode for the oneDNN library to BF16 (bfloat16). This can improve performance and efficiency on hardware that supports BF16 precision.
27+
28+
*`export THP_MEM_ALLOC_ENABLE=1` - enables an optimized memory allocation strategy - often leveraging transparent huge pages - which can enhance memory management and reduce fragmentation in frameworks like PyTorch.
29+
30+
*`export LRU_CACHE_CAPACITY=1024` - configures the capacity of a Least Recently Used (LRU) cache to 1024 entries. This helps store and quickly retrieve recently used data, reducing redundant computations.
31+
32+
*`export OMP_NUM_THREADS=32` - sets the number of threads for OpenMP-based parallel processing to 32, allowing your application to take full advantage of multi-core systems for faster performance.
2233

2334
{{% notice Note %}}
2435
BF16 support is merged into PyTorch versions greater than 2.3.0.
2536
{{% /notice %}}
2637

2738
## Run Whisper File
28-
After setting the environment variables in the previous step, now lets run the Whisper model again and analyze the performance impact.
39+
After setting the environment variables in the previous step, run the Whisper model again and analyze the performance impact.
2940

3041
Run the `whisper-application.py` file:
3142

@@ -35,10 +46,10 @@ python3 whisper-application.py
3546

3647
## Analyze output
3748

38-
You should now observe that the processing time has gone down compared to the last run:
49+
You should now see that the processing time has gone down compared to the last run:
3950

4051
![frontend](whisper_output.png)
4152

42-
The output in the above image has the log containing `attr-fpmath:bf16`, which confirms that fast math BF16 kernels are used in the compute process to improve the performance.
53+
The output in the above image has the log containing `attr-fpmath:bf16`, which confirms that the compute process uses fast math BF16 kernels to improve performance.
4354

44-
By enabling the environment variables as described in the learning path you can see the performance uplift with the Whisper using Hugging Face Transformers framework on Arm.
55+
You have now learned how configuring these environment variables can achieve performance uplift of OpenAI's Whisper model when using Hugging Face Transformers framework on Arm-based systems.

0 commit comments

Comments
 (0)