Update whisper_deploy.md

pareenaverma · web-flow · commit b871e583dcc5 · 2025-02-27T09:41:58.000-05:00
diff --git a/content/learning-paths/servers-and-cloud-computing/whisper/whisper_deploy.md b/content/learning-paths/servers-and-cloud-computing/whisper/whisper_deploy.md
@@ -5,13 +5,13 @@ weight: 4
 layout: learningpathall
 ---
 
-## Setting Arm-specific flags
+## Setting environment variables that impact performance
 
 Speech-to-text applications often process large amounts of audio data in real time, requiring efficient computation to balance accuracy and speed. Low-level implementations of the kernels in the neural network enhance performance by reducing processing overhead. When tailored for specific hardware architectures, such as Arm CPUs, these kernels accelerate key tasks like feature extraction and neural network inference. Optimized kernels ensure that speech models like OpenAI’s Whisper can run efficiently, making high-quality transcription more accessible across various server applications.
 
-Other considerations below allows us to use the memory more efficiently. Things like allocating additional memory and threads for a certain task can increase performance. By enabling these hardware-aware options, applications achieve lower latency, reduced power consumption, and smoother real-time transcription.
+Other considerations below allow us to use the memory more efficiently. Things like allocating additional memory and threads for a certain task can increase performance. By enabling these hardware-aware options, applications achieve lower latency, reduced power consumption, and smoother real-time transcription.
 
-Use the following flags to enable fast math GEMM kernels, Linux Transparent Huge Page (THP) allocations, logs to confirm kernel and set LRU cache capacity and OMP_NUM_THREADS to run the Whisper efficiently on Arm machines.
+Use the following flags to enable fast math BFloat16(BF16) GEMM kernels, Linux Transparent Huge Page (THP) allocations, logs to confirm kernel and set LRU cache capacity and OMP_NUM_THREADS to run the Whisper efficiently on Arm machines.
 
 ```bash
 export DNNL_DEFAULT_FPMATH_MODE=BF16
@@ -25,7 +25,7 @@ BF16 support is merged into PyTorch versions greater than 2.3.0.
 {{% /notice %}}
 
 ## Run Whisper File
-After enabling the Arm specific flags in the previous step, now lets run the Whisper model again and analyze it.
+After setting the environment variables in the previous step, now lets run the Whisper model again and analyze the performance impact.
 
 Run the `whisper-application.py` file:
 
@@ -41,4 +41,4 @@ You should now observe that the processing time has gone down compared to the la
 
 The output in the above image has the log containing `attr-fpmath:bf16`, which confirms that fast math BF16 kernels are used in the compute process to improve the performance.
 
-By enabling the Arm specific flags as described in the learning path you can see the performance uplift with the Whisper using Hugging Face Transformers framework on Arm.
+By enabling the environment variables as described in the learning path you can see the performance uplift with the Whisper using Hugging Face Transformers framework on Arm.