Fix precommit (#7376)

vwbaker · web-flow · commit da7c595ecb22 · 2025-07-02T21:19:40.000+08:00
Ran: `pre-commit run --all-files` (removes trailing whitespace)
diff --git a/docs/meetups/03-12-2025/notes.md b/docs/meetups/03-12-2025/notes.md
@@ -9,15 +9,15 @@
 Speakers: Hongtao Yu (Meta), Yuanwei (Kevin) Fang (Meta), Manman Ren (Meta)
 
 Notes:
-* Pytorch 2.6 with Triton release branch 3.2 
+* Pytorch 2.6 with Triton release branch 3.2
 * Targeting: Nvidia Hopper arch, Blackwell coming soon.
 * Performance
   * Meta’s FP8Rowwise GEMM (3-5% improvement, 1D persistent loop)
   * FlashAttention (10-15% improvement, could be faster with pipelining and pingpong scheduling).
 * What is warp specialization?
   * Improves hardware instruction scheduling. GPUs don’t have good dynamic instruction scheduling.
   * Use multi-way warp scheduler. Allows warps on a single core targeting different function units (e.g. memory, ALU, tensor core, etc.)  All run in parallel.
-* Comparison using GEMM * * 
+* Comparison using GEMM * *
   * Uniform warps: 8 warps, each loading/processing 1/8th of data.  Divided into two groups, each doing ½ the data. Good for GEMM but not for more complicated kernels.
   * Warp specialized: 12 warps, 4 warps for producing data-only do load, 8 for wgmma-only do wmma.  Frees up more capacity for more complex kernels like flash attention.
 * Compiler implementation
@@ -60,7 +60,7 @@ Notes:
     * Data partitioning
     * Communication pipelining and ping-pong scheduling
     * Ping-pong is named barrier pair. Only one consumer can be in region.
-   
+
 ## Questions
 * Q> Is there an equivalent warp group for AMD? Does this apply to AMD GPUs?
 * A> Meta is doing this for AMD. No named barrier in AMD. Simulating this using shared-memory atomics on AMD to get the same effect.
@@ -87,7 +87,7 @@ Notes:
 
 ### Progress
 * Modularizing compiler passes. Decoupled data extraction from lowering. Allowed for customized lowering flows. Predictable behavior for analysis failures.
-  * Triton-to-structured 
+  * Triton-to-structured
   * triton-arith-to-linalg
   * Structured-to-memref
 * Improvements to pointer analysis
diff --git a/docs/meetups/05-01-2025/notes.md b/docs/meetups/05-01-2025/notes.md
@@ -72,7 +72,7 @@ Speaker: Sayce Falk (Google), Cicie Wang (Meta), Jason Knight (Nvidia), Keren Zh
 * Q> Anyone interested in this?
 * A> Maybe first step, identify how much generated code is affected by a pull request (give a signal to say something about the blast radius of a change).
 * Q> Intel had an intern looking at this.
-* Q> Intel<Alexander> - if you're interested reach out over slack. 
+* Q> Intel<Alexander> - if you're interested reach out over slack.
 
 ## What talks/tutorials/open discussions would you like to see at the 2025 Triton Developers' Summit? How can we help?
 Speaker: Adnan Aziz (Meta)