You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/learning-paths/servers-and-cloud-computing/whisper/whisper_deploy.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,13 +5,13 @@ weight: 4
5
5
layout: learningpathall
6
6
---
7
7
8
-
## Setting Arm-specific flags
8
+
## Setting environment variables that impact performance
9
9
10
10
Speech-to-text applications often process large amounts of audio data in real time, requiring efficient computation to balance accuracy and speed. Low-level implementations of the kernels in the neural network enhance performance by reducing processing overhead. When tailored for specific hardware architectures, such as Arm CPUs, these kernels accelerate key tasks like feature extraction and neural network inference. Optimized kernels ensure that speech models like OpenAI’s Whisper can run efficiently, making high-quality transcription more accessible across various server applications.
11
11
12
-
Other considerations below allows us to use the memory more efficiently. Things like allocating additional memory and threads for a certain task can increase performance. By enabling these hardware-aware options, applications achieve lower latency, reduced power consumption, and smoother real-time transcription.
12
+
Other considerations below allow us to use the memory more efficiently. Things like allocating additional memory and threads for a certain task can increase performance. By enabling these hardware-aware options, applications achieve lower latency, reduced power consumption, and smoother real-time transcription.
13
13
14
-
Use the following flags to enable fast math GEMM kernels, Linux Transparent Huge Page (THP) allocations, logs to confirm kernel and set LRU cache capacity and OMP_NUM_THREADS to run the Whisper efficiently on Arm machines.
14
+
Use the following flags to enable fast math BFloat16(BF16) GEMM kernels, Linux Transparent Huge Page (THP) allocations, logs to confirm kernel and set LRU cache capacity and OMP_NUM_THREADS to run the Whisper efficiently on Arm machines.
15
15
16
16
```bash
17
17
export DNNL_DEFAULT_FPMATH_MODE=BF16
@@ -25,7 +25,7 @@ BF16 support is merged into PyTorch versions greater than 2.3.0.
25
25
{{% /notice %}}
26
26
27
27
## Run Whisper File
28
-
After enabling the Arm specific flags in the previous step, now lets run the Whisper model again and analyze it.
28
+
After setting the environment variables in the previous step, now lets run the Whisper model again and analyze the performance impact.
29
29
30
30
Run the `whisper-application.py` file:
31
31
@@ -41,4 +41,4 @@ You should now observe that the processing time has gone down compared to the la
41
41
42
42
The output in the above image has the log containing `attr-fpmath:bf16`, which confirms that fast math BF16 kernels are used in the compute process to improve the performance.
43
43
44
-
By enabling the Arm specific flags as described in the learning path you can see the performance uplift with the Whisper using Hugging Face Transformers framework on Arm.
44
+
By enabling the environment variables as described in the learning path you can see the performance uplift with the Whisper using Hugging Face Transformers framework on Arm.
0 commit comments