Skip to content

Commit b180a67

Browse files
authored
Revise SBO section title (#194)
1 parent 22c3f4d commit b180a67

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed

blog/2025-09-01-sglang-longcat-flash.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -34,12 +34,12 @@ As noted in our tech report, a typical ReACT-based agent system imposes extreme
3434

3535
To enable independent optimization of prefilling and decoding phases, PD-Disaggregated architecture is adopted. Based on SGLang's PD Disaggregation, we developed our solution featuring layer-wise transmission, which significantly reduces Time-To-First-Token (TTFT) under high QPS workloads.
3636

37-
#### 3.2 SBO
37+
#### 3.2 Single Batch Overlap (SBO)
3838

3939
SBO is a four-stage pipeline execution that uses module-level overlap to fully unleash LongCat-Flash’s potential. SBO differs from TBO by hiding communication overhead within a single batch. In SBO,
4040

4141
- **Stage 1** requires separate execution because the MLA output serves as input for subsequent stages.
42-
- **Stage 2** is all-to-all dispatch overlapped with Dense FFN and Attn 0 (QKV Projection). This overlap iscrucial because communication overhead is excessive, prompting us to split the attention process.
42+
- **Stage 2** is all-to-all dispatch overlapped with Dense FFN and Attn 0 (QKV Projection). This overlap is crucial because communication overhead is excessive, prompting us to split the attention process.
4343
- **Stage 3** independently executes MoE GEMM. The latency of this stage will benefit from the wide EP deployment strategy.
4444
- **Stage 4** overlaps Attn 1 (Core Attention and Output Projection) and Dense FFN with the all-to-all combine.
4545

@@ -132,4 +132,4 @@ We would like to express our heartfelt gratitude to the following teams and coll
132132
- **SGLang Team and community:** for their work on SGLang framework.
133133
- **Mooncake Team** for their earliest opensource work in the industry on PD Disaggregation architecture and TransferEngine.
134134
- **NVIDIA TensorRT-LLM:** for efficient kernels on Hopper GPUs.
135-
- **Meituan LongCat Team**: for our Model-System co-design.
135+
- **Meituan LongCat Team**: for our Model-System co-design.

0 commit comments

Comments
 (0)