Skip to content

Commit 4a6540c

Browse files
authored
Merge pull request #1633 from nobelchowdary/whisper
Whisper Learning Path added
2 parents 4b9388d + d9faae5 commit 4a6540c

File tree

5 files changed

+210
-0
lines changed

5 files changed

+210
-0
lines changed
Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
---
2+
title: Run OpenAI Whisper Audio Model efficiently on Arm with Hugging Face Transformers
3+
4+
minutes_to_complete: 15
5+
6+
who_is_this_for: This Learning Path is for software developers, ML engineers, and those looking to run Whisper ASR Model on Arm Neoverse based CPUs efficiently and build speech transcription based applications around it.
7+
8+
learning_objectives:
9+
- Install the dependencies to run the Whisper Model
10+
- Run the OpenAI Whisper model using Hugging Face Transformers framework.
11+
- Run the whisper-large-v3-turbo model on Arm CPU efficiently.
12+
- Perform the audio to text transcription with Whisper.
13+
- Observe the total time taken to generate transcript with Whisper.
14+
15+
16+
prerequisites:
17+
- Amazon Graviton4 (or other Arm) compute instance with 32 cores, 8GB of RAM, and 32GB disk space.
18+
- Basic understanding of Python and ML concepts.
19+
- Understanding of Whisper ASR Model fundamentals.
20+
21+
author: Nobel Chowdary Mandepudi
22+
23+
### Tags
24+
skilllevels: Intermediate
25+
armips:
26+
- Neoverse
27+
subjects: ML
28+
operatingsystems:
29+
- Linux
30+
tools_software_languages:
31+
- Python
32+
- Whisper
33+
- AWS Graviton
34+
35+
### FIXED, DO NOT MODIFY
36+
# ================================================================================
37+
weight: 1 # _index.md always has weight of 1 to order correctly
38+
layout: "learningpathall" # All files under learning paths have this same wrapper
39+
learning_path_main_page: "yes" # This should be surfaced when looking for related content. Only set for _index.md of learning path content.
40+
---
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
---
2+
# ================================================================================
3+
# FIXED, DO NOT MODIFY THIS FILE
4+
# ================================================================================
5+
weight: 21 # Set to always be larger than the content in this path to be at the end of the navigation.
6+
title: "Next Steps" # Always the same, html page title.
7+
layout: "learningpathall" # All files under learning paths have this same wrapper for Hugo processing.
8+
---
Lines changed: 134 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,134 @@
1+
---
2+
# User change
3+
title: "Setup the Whisper Model"
4+
5+
weight: 2
6+
7+
# Do not modify these elements
8+
layout: "learningpathall"
9+
---
10+
11+
## Before you begin
12+
13+
This Learning Path demonstrates how to run the whisper-large-v3-turbo model as an application that takes the audio input and computes out the text transcript of it. The instructions in this Learning Path have been designed for Arm servers running Ubuntu 24.04 LTS. You need an Arm server instance with 32 cores, atleast 8GB of RAM and 32GB disk to run this example. The instructions have been tested on a AWS c8g.8xlarge instance.
14+
15+
## Overview
16+
17+
OpenAI Whisper is an open-source Automatic Speech Recognition (ASR) model trained on the multilingual and multitask data, which enables the transcript generation in multiple languages and translations from different languages to English. We will explore the foundational aspects of speech-to-text transcription applications, specifically focusing on running OpenAI’s Whisper on an Arm CPU. We will discuss the implementation and performance considerations required to efficiently deploy Whisper using Hugging Face Transformers framework.
18+
19+
## Install dependencies
20+
21+
Install the following packages on your Arm based server instance:
22+
23+
```bash
24+
sudo apt update
25+
sudo apt install python3-pip python3-venv ffmpeg -y
26+
```
27+
28+
## Install Python Dependencies
29+
30+
Create a Python virtual environment:
31+
32+
```bash
33+
python3 -m venv whisper-env
34+
```
35+
36+
Activate the virtual environment:
37+
38+
```bash
39+
source whisper-env/bin/activate
40+
```
41+
42+
Install the required libraries using pip:
43+
44+
```python3
45+
pip install torch transformers accelerate
46+
```
47+
48+
## Download the sample audio file
49+
50+
Download a sample audio file, which is about 33sec audio in .wav format or use your own audio file:
51+
```bash
52+
wget https://www.voiptroubleshooter.com/open_speech/american/OSR_us_000_0010_8k.wav
53+
```
54+
55+
## Create a python script for audio to text transcription
56+
57+
Create a python file:
58+
59+
```bash
60+
vim whisper-application.py
61+
```
62+
63+
Write the following code in the `whisper-application.py` file:
64+
```python
65+
import torch
66+
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
67+
import time
68+
69+
# Set the device to CPU and specify the torch data type
70+
device = "cpu"
71+
torch_dtype = torch.float32
72+
73+
# Specify the model name
74+
model_id = "openai/whisper-large-v3-turbo"
75+
76+
# Load the model with specified configurations
77+
model = AutoModelForSpeechSeq2Seq.from_pretrained(
78+
model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
79+
)
80+
81+
# Move the model to the specified device
82+
model.to(device)
83+
84+
# Load the processor for the model
85+
processor = AutoProcessor.from_pretrained(model_id)
86+
87+
# Create a pipeline for automatic speech recognition
88+
pipe = pipeline(
89+
"automatic-speech-recognition",
90+
model=model,
91+
tokenizer=processor.tokenizer,
92+
feature_extractor=processor.feature_extractor,
93+
torch_dtype=torch_dtype,
94+
device=device,
95+
return_timestamps=True
96+
)
97+
98+
# Record the start time of the inference
99+
start_time = time.time()
100+
101+
# Perform speech recognition on the audio file
102+
result = pipe("OSR_us_000_0010_8k.wav")
103+
104+
# Record the end time of the inference
105+
end_time = time.time()
106+
107+
# Print the transcribed text
108+
print(f'\n{result["text"]}\n')
109+
110+
# Calculate and print the duration of the inference
111+
duration = end_time - start_time
112+
hours = duration // 3600
113+
minutes = (duration - (hours * 3600)) // 60
114+
seconds = (duration - ((hours * 3600) + (minutes * 60)))
115+
msg = f'\nInferencing elapsed time: {seconds:4.2f} seconds\n'
116+
117+
print(msg)
118+
119+
```
120+
121+
## Use the Arm specific flags:
122+
123+
Use the following flags to enable fast math GEMM kernels, Linux Transparent Huge Page (THP) allocations, logs to confirm kernel and set LRU cache capacity and OMP_NUM_THREADS to run the Whisper efficiently on Arm machines.
124+
125+
```bash
126+
export DNNL_DEFAULT_FPMATH_MODE=BF16
127+
export THP_MEM_ALLOC_ENABLE=1
128+
export LRU_CACHE_CAPACITY=1024
129+
export OMP_NUM_THREADS=32
130+
export DNNL_VERBOSE=1
131+
```
132+
{{% notice Note %}}
133+
BF16 support is merged into PyTorch versions greater than 2.3.0.
134+
{{% /notice %}}
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
---
2+
title: Run the Whisper Model
3+
weight: 4
4+
5+
layout: learningpathall
6+
---
7+
8+
## Run Whisper File
9+
After installing the dependencies and enabling the Arm specific flags in the previous step, now lets run the Whisper model and analyze it.
10+
11+
Run the `whisper-application.py` file:
12+
13+
```python
14+
python3 whisper-application.py
15+
```
16+
17+
## Output
18+
19+
You should see output similar to the image below with the log since we enabled verbose, transcript of the audio and the audio transcription time:
20+
![frontend](whisper_output.png)
21+
22+
## Analyze
23+
24+
The output in the above image has the log containing `attr-fpmath:bf16`, which confirms that fast math BF16 kernels are used in the compute process to improve the performance.
25+
26+
It also generated the text transcript of the audio and the `Inference elapsed time`.
27+
28+
By enabling the Arm specific flags as described in the learning path you can see the performance upliftment with the Whisper using Hugging Face Transformers framework on Arm.
156 KB
Loading

0 commit comments

Comments
 (0)