Apply suggestion from @wwwjn

lkhphuc · wwwjn · web-flow · commit 6cf9d67ca235 · 2025-08-28T17:27:04.000+07:00
Co-authored-by: Jiani Wang &lt;40016222+wwwjn@users.noreply.github.com&gt;
diff --git a/torchtitan/experiments/vlm/README.md b/torchtitan/experiments/vlm/README.md
@@ -16,4 +16,4 @@ Distributed training usually does not play nice with input of varying shapes. To
 Then we scatter the patch embeddings to their actual positions in the LLM input tokens.
 This result in a very simple and general interface to train modern VLM with interleaved data and native resolution & aspect ratio.
 By setting the appropriate dataloader hyperparameters, we can easily reduce the amount of padding tokens.
-We leverage Flex Attention to efficiently handle varying number of patches per image.
+We leverage FlexAttention to efficiently handle varying number of patches per image.