Img widths 100%

ProExpertProg · ProExpertProg · commit 4f994c8887ee · 2025-08-21T11:51:48.000-04:00
Signed-off-by: Luka Govedič &lt;luka@neuralmagic.com&gt;
diff --git a/_posts/2025-08-20-torch-compile.md b/_posts/2025-08-20-torch-compile.md
@@ -22,7 +22,7 @@ In the following example, torch.compile produces a single fused kernel for all p
 
 <p align="center">
 <picture>
-<img src="/assets/figures/2025-torch-compile/figure1.png" width="80%">
+<img src="/assets/figures/2025-torch-compile/figure1.png" width="100%">
 </picture><br>
 <b>Figure 1</b>: torch.compile is a JIT compiler for PyTorch code. You can wrap functions, nn.Modules, and other callables in torch.compile.
 </p>
@@ -35,7 +35,7 @@ One way of optimizing models is to write custom CPU/CUDA operations that perform
 
 <p align="center">
 <picture>
-<img src="/assets/figures/2025-torch-compile/figure2.png" width="80%">
+<img src="/assets/figures/2025-torch-compile/figure2.png" width="100%">
 </picture><br>
 <b>Figure 2</b>: torch.compile gives you fast baseline performance to save YOU development time from tuning model performance.
 </p>
@@ -52,7 +52,7 @@ In the following code example, torch.save is an unsupported operation: torch.com
 
 <p align="center">
 <picture>
-<img src="/assets/figures/2025-torch-compile/figure3.png" width="80%">
+<img src="/assets/figures/2025-torch-compile/figure3.png" width="100%">
 </picture><br>
 <b>Figure 3</b>: torch.compile captures straight-line graphs of Tensor operations and works around unsupported operations like torch.save.
 </p>
@@ -80,7 +80,7 @@ The compiled artifacts and the cache can be reused across machines with the same
 
 <p align="center">
 <picture>
-<img src="/assets/figures/2025-torch-compile/figure4.png" width="80%">
+<img src="/assets/figures/2025-torch-compile/figure4.png" width="100%">
 </picture><br>  
 <b>Figure 4</b>: Compiled artifacts are cached after cold start and can be reused across machines to ensure fast, consistent startup when set up correctly.
 </p>
@@ -93,8 +93,8 @@ Use `compile_sizes: [1, 2, 4]` in your config to trigger this specialization. Un
 
 <p align="center">
 <picture>
-<img src="/assets/figures/2025-torch-compile/figure5_a.png" width="80%">
-<img src="/assets/figures/2025-torch-compile/figure5_b.png" width="80%">
+<img src="/assets/figures/2025-torch-compile/figure5_a.png" width="100%">
+<img src="/assets/figures/2025-torch-compile/figure5_b.png" width="100%">
 </picture><br>  
 <b>Figure 5</b>: How to specify specializing compilation on specific batch sizes.
 </p>
@@ -105,7 +105,7 @@ Not all operations are compatible with CUDA Graphs; for example, [cascade attent
 
 <p align="center">
 <picture>
-<img src="/assets/figures/2025-torch-compile/figure6.png" width="80%">
+<img src="/assets/figures/2025-torch-compile/figure6.png" width="100%">
 </picture><br>  
 <b>Figure 6</b>: Piecewise CUDA Graphs in vLLM capture and replay supported GPU kernel sequences for low-overhead execution, while skipping unsupported operations like cascade attention.
 </p>
@@ -129,14 +129,14 @@ A common pattern in quantized MLPs is SiLU activation followed by a quantized do
 
 <p align="center">
 <picture>
-<img src="/assets/figures/2025-torch-compile/figure7.png" width="80%">
+<img src="/assets/figures/2025-torch-compile/figure7.png" width="100%">
 </picture><br>
 <b>Figure 7</b>: On Llama 3.1 405B quantized to FP8, tested on 8x AMD MI300s, fused kernels (<code>fusion</code>, in yellow) outperformed both <code>default</code> (using torch ops for RMSNorm and SiLU and custom FP8 quant kernel) and <code>custom</code> (unfused custom kernels). 
 </p>
 
 <p align="center">
 <picture>
-<img src="/assets/figures/2025-torch-compile/figure8.png" width="80%">
+<img src="/assets/figures/2025-torch-compile/figure8.png" width="100%">
 </picture><br>
 <b>Figure 8</b>: Detailed throughput speedup comparing <code>fusion</code> and <code>default</code> regimes above. If all quantization overhead (8%) was removed via fusion, the theoretical maximum improvement to throughput would be 8%, and we can see that improvement reached in some cases.
 </p>