fix grammar

heheda12345 · heheda12345 · commit 439995b3b9b0 · 2025-09-11T11:43:17.000-07:00
Signed-off-by: heheda &lt;zhangch99@outlook.com&gt;
diff --git a/_posts/2025-09-11-qwen3-next.md b/_posts/2025-09-11-qwen3-next.md
@@ -50,11 +50,11 @@ In order to manage state for hybrid models like Qwen3-Next, vLLM automatically t
 </p>
 
 
-In addition, Flash Linear Attention is based on Triton. Launching Triton kernels can incur significant CPU overheads that disproportionately affect decode-only batches. To overcome this, vLLM enables full CUDA graph mode by default, ensuring good performance in low-latency scenarios
+In addition, Flash Linear Attention is based on Triton. Launching Triton kernels can incur significant CPU overheads that disproportionately affect decode-only batches. To overcome this, vLLM enables full CUDA graph mode by default, ensuring good performance in low-latency scenarios.
 
 ## **High-Sparsity MoE: Extreme Efficiency**
 
-Qwen3-Next pushes sparsity further with **MoE layers at 1:50 activation ratio**. In the flagship **80B-A3B model**, only **3B parameters are active per token**. vLLM can have great throughput and latency with the built-in efficient MoE implementation.
+Qwen3-Next pushes sparsity further with **MoE layers at a 1:50 activation ratio**. In the flagship **80B-A3B model**, only **3B parameters are active per token**. vLLM can have great throughput and latency with the built-in efficient MoE implementation.
 
 
 ## **Multi-Token Prediction (MTP)**
@@ -73,13 +73,13 @@ Our Qwen3-Next integration is just the beginning. On the roadmap:
 
 This effort was made possible thanks to close collaboration with many partners:
 
-* **Qwen Team**, including Tao He, Jianwei Zhang for open-sourcing the model.  
+* **Qwen Team**, including Tao He, Jianwei Zhang, for open-sourcing the model.  
 * **Flash Linear Attention team**, including Yu Zhang, etc. for reviewing the gated deltanet attention kernels and improving the numerics.  
 * **NVIDIA**, including Vadim Gimpelson for testing the models.  
 * **IBM Research**, including Thomas Parnell for hybrid memory management and CUDA graph optimizations.  
 * **Red Hat**, including Tyler Michael Smith, Doug Smith, Tarun Kumar, and Elvir Crncevic for testing the model and tuning MoE kernels.  
-* **Community partners**: Roblox, Meta,  — for testing, feedback, and scaling insights.
+* **Community partners**: Roblox, Meta, etc. for testing, feedback, and scaling insights.
 
-vLLM team members who contributed to this effort are: Jie Li, Kaichao You, Chen Zhang, Simon Mo.
+vLLM team members who contributed to this effort include: Jie Li, Kaichao You, Chen Zhang, Simon Mo.
 
 👉 Qwen3-Next is now available in **vLLM**. Try it out today and experience **ultra-efficient long-context inference** with the latest hybrid MoE architecture.