minor: minor update author, content (#284)

DarkSharpness · web-flow · commit 6fd22d64821f · 2025-12-18T23:06:21.000+08:00
diff --git a/blog/2025-12-17-minisgl.md b/blog/2025-12-17-minisgl.md
@@ -1,11 +1,11 @@
 ---
 title: "Mini-SGLang: Efficient Inference Engine in a Nutshell"
-author: "SGLang Team"
+author: "Ziyi Xu"
 date: "December 17, 2025"
 previewImg: /images/blog/minisgl/logo.png
 ---
 
-We're excited to introduce **Mini-SGLang**, a lightweight yet high-performance inference framework for Large Language Models (LLMs). Derived from the [SGLang](https://github.com/sgl-project/sglang) project, Mini-SGLang is designed to demystify the complexities of modern serving systems. Despite its compact codebase, it retains the advanced features that define state-of-the-art performance, including **Radix Attention** for efficient KV cache reuse, **Chunked Prefill** for controlled memory footprint, and **Tensor Parallelism** for scalable distributed serving. With an OpenAI-compatible API and out-of-the-box support for models like Llama-3 and Qwen-3, Mini-SGLang serves as both a capable inference engine and a transparent reference implementation for researchers and developers.
+We're excited to introduce **Mini-SGLang**, a lightweight yet high-performance inference framework for Large Language Models (LLMs). Derived from the [SGLang](https://github.com/sgl-project/sglang) project, Mini-SGLang is designed to demystify the complexities of modern serving systems. Despite its compact codebase, it retains the advanced features that define state-of-the-art performance, including **Radix Attention** for efficient KV cache reuse, **Chunked Prefill** for controlled memory footprint, **Overlap Scheduling** for reduced CPU overhead, and **Tensor Parallelism** for scalable distributed serving. With an OpenAI-compatible API and out-of-the-box support for models like Llama-3 and Qwen-3, Mini-SGLang serves as both a capable inference engine and a transparent reference implementation for researchers and developers.
 
 The source code is available at [https://github.com/sgl-project/mini-sglang](https://github.com/sgl-project/mini-sglang).
 
@@ -25,7 +25,7 @@ Despite its simplicity, Mini-SGLang supports both online and offline inference a
 
 Many ML and system researchers struggle to integrate their optimizations into existing framework. On one hand, injecting new logic into complex frameworks like SGLang is risky: you may easily break implicit invariants of the system, which gives rise to subtle bugs. On the other hand, building an inference engine from scratch is tedious, requiring significant effort to handle infrastructure details (e.g., frontend servers, tokenization, NCCL communication) just to match state-of-the-art baselines.
 
-Mini-SGLang strikes a balance. It offers an out-of-the-box, high-performance framework that is easy to inspect, extend and optimize. It handles the heavy lifting of infrastructure while being flexible enough for rapid prototyping. Additionally, Mini-SGLang provides **OpenAI-compatible benchmark utilities**, facilitating end-to-end performance analysis and comparison against various serving engines, such as [SGLang](https://github.com/sgl-project/sglang), [vLLM](https://github.com/vllm-project/vllm) and [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM). For kernel developers, Mini-SGLang also provides fine-grained **NVTX annotations**, which are very valuable for kernel debugging and performance profiling.
+Mini-SGLang strikes a balance. It started as a research prototype we used to validate new system ideas quickly, without spending weeks handling a full-scale codebase or re-implementing infrastructure from scratch. It offers an out-of-the-box, high-performance framework that is easy to inspect, extend and optimize. It handles the heavy lifting of infrastructure while being flexible enough for rapid prototyping. Additionally, Mini-SGLang provides **OpenAI-compatible benchmark utilities**, facilitating end-to-end performance analysis and comparison against various serving engines, such as [SGLang](https://github.com/sgl-project/sglang), [vLLM](https://github.com/vllm-project/vllm) and [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM). For kernel developers, Mini-SGLang also provides fine-grained **NVTX annotations**, which are very valuable for kernel debugging and performance profiling.
 
 ## Features
 
@@ -75,7 +75,7 @@ The throughput results (in tokens per second) are shown below:
 
 The results show that Mini-SGLang consistently outperforms Nano-vLLM baseline on both Qwen3 models, thanks to our **overlap scheduling** mechanism that effectively hides CPU overhead.
 
-**Reproducibility**: The offline benchmark script is available at [this link](https://github.com/sgl-project/mini-sglang/blob/main/benchmark/offline/bench_nanovllm.py).
+**Reproducibility**: The offline benchmark script is available at [this link](https://github.com/sgl-project/mini-sglang/blob/main/benchmark/offline/bench.py).
 
 ### Online Serving Latency