Skip to content

Commit b871e58

Browse files
authored
Update whisper_deploy.md
1 parent bcbd35d commit b871e58

File tree

1 file changed

+5
-5
lines changed

1 file changed

+5
-5
lines changed

content/learning-paths/servers-and-cloud-computing/whisper/whisper_deploy.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -5,13 +5,13 @@ weight: 4
55
layout: learningpathall
66
---
77

8-
## Setting Arm-specific flags
8+
## Setting environment variables that impact performance
99

1010
Speech-to-text applications often process large amounts of audio data in real time, requiring efficient computation to balance accuracy and speed. Low-level implementations of the kernels in the neural network enhance performance by reducing processing overhead. When tailored for specific hardware architectures, such as Arm CPUs, these kernels accelerate key tasks like feature extraction and neural network inference. Optimized kernels ensure that speech models like OpenAI’s Whisper can run efficiently, making high-quality transcription more accessible across various server applications.
1111

12-
Other considerations below allows us to use the memory more efficiently. Things like allocating additional memory and threads for a certain task can increase performance. By enabling these hardware-aware options, applications achieve lower latency, reduced power consumption, and smoother real-time transcription.
12+
Other considerations below allow us to use the memory more efficiently. Things like allocating additional memory and threads for a certain task can increase performance. By enabling these hardware-aware options, applications achieve lower latency, reduced power consumption, and smoother real-time transcription.
1313

14-
Use the following flags to enable fast math GEMM kernels, Linux Transparent Huge Page (THP) allocations, logs to confirm kernel and set LRU cache capacity and OMP_NUM_THREADS to run the Whisper efficiently on Arm machines.
14+
Use the following flags to enable fast math BFloat16(BF16) GEMM kernels, Linux Transparent Huge Page (THP) allocations, logs to confirm kernel and set LRU cache capacity and OMP_NUM_THREADS to run the Whisper efficiently on Arm machines.
1515

1616
```bash
1717
export DNNL_DEFAULT_FPMATH_MODE=BF16
@@ -25,7 +25,7 @@ BF16 support is merged into PyTorch versions greater than 2.3.0.
2525
{{% /notice %}}
2626

2727
## Run Whisper File
28-
After enabling the Arm specific flags in the previous step, now lets run the Whisper model again and analyze it.
28+
After setting the environment variables in the previous step, now lets run the Whisper model again and analyze the performance impact.
2929

3030
Run the `whisper-application.py` file:
3131

@@ -41,4 +41,4 @@ You should now observe that the processing time has gone down compared to the la
4141

4242
The output in the above image has the log containing `attr-fpmath:bf16`, which confirms that fast math BF16 kernels are used in the compute process to improve the performance.
4343

44-
By enabling the Arm specific flags as described in the learning path you can see the performance uplift with the Whisper using Hugging Face Transformers framework on Arm.
44+
By enabling the environment variables as described in the learning path you can see the performance uplift with the Whisper using Hugging Face Transformers framework on Arm.

0 commit comments

Comments
 (0)