Update _posts/2025-10-25-semantic-router-modular.md

Xunzhuo · hmellor · web-flow · commit 33c264d504af · 2025-10-27T19:44:15.000+08:00
Co-authored-by: Harry Mellor &lt;19981378+hmellor@users.noreply.github.com&gt;
diff --git a/_posts/2025-10-25-semantic-router-modular.md b/_posts/2025-10-25-semantic-router-modular.md
@@ -115,7 +115,7 @@ The benefits of this architecture vary by workload:
 - Single vs multi-task classification: LoRA provides minimal benefit since there's no base model sharing. Traditional fine-tuned models may be faster. LoRA shows clear advantages when performing multiple classifications on the same input. Since the base model runs once and only LoRA adapters execute for each task, the overhead is substantially reduced compared to running separate full models. The actual speedup depends on the ratio of base model computation to adapter computation.
 - Long-context inputs: Qwen3-Embedding enables routing decisions on documents up to 32K tokens without truncation, extending beyond ModernBERT's 8K limit for very long documents. With Flash Attention 2 enabled on compatible GPUs, the performance advantage becomes more substantial as context length increases.
 - Multilingual routing: Models can now handle routing decisions for languages where ModernBERT has limited training data.
-- High concurrency: OnceLock eliminates lock contention, allowing throughput to scale with CPU cores for classification operations.
+- High concurrency: `OnceLock` eliminates lock contention, allowing throughput to scale with CPU cores for classification operations.
 - GPU acceleration: When Flash Attention 2 is enabled, attention operations run 3-4× faster, with the speedup becoming more pronounced at longer sequence lengths. This makes GPU deployment particularly advantageous for high-throughput scenarios.
 
 ## Future Directions