Skip to content

Commit aea81c8

Browse files
committed
Add anchor links for chapters
1 parent 6798bad commit aea81c8

File tree

1 file changed

+8
-8
lines changed

1 file changed

+8
-8
lines changed

_posts/2025-09-05-anatomy-of-vllm.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -18,11 +18,11 @@ Later posts will dive into specific subsystems.
1818

1919
This post is structured into five parts:
2020

21-
1. LLM engine & engine core: fundamentals of vLLM (scheduling, paged attention, continuous batching, etc.)
22-
2. Advanced features: chunked prefill, prefix caching, guided & speculative decoding, disaggregated P/D
23-
3. Scaling up: from single-GPU to multi-GPU execution
24-
4. Serving layer: distributed / concurrent web scaffolding
25-
5. Benchmarks and auto-tuning: measuring latency and throughput
21+
1. [LLM engine & engine core](#llm-engine--engine-core): fundamentals of vLLM (scheduling, paged attention, continuous batching, etc.)
22+
2. [Advanced features](#advanced-features--extending-the-core-engine-logic): chunked prefill, prefix caching, guided & speculative decoding, disaggregated P/D
23+
3. [Scaling up](#from-uniprocexecutor-to-multiprocexecutor): from single-GPU to multi-GPU execution
24+
4. [Serving layer](#distributed-system-serving-vllm): distributed / concurrent web scaffolding
25+
5. [Benchmarks and auto-tuning](#benchmarks-and-auto-tuning---latency-vs-throughput): measuring latency and throughput
2626

2727
> [!NOTE]
2828
> * Analysis is based on [commit 42172ad](https://github.com/vllm-project/vllm/tree/42172ad) (August 9th, 2025).
@@ -80,9 +80,9 @@ Let's start analyzing the constructor.
8080
The main components of the engine are:
8181

8282
* vLLM config (contains all of the knobs for configuring model, cache, parallelism, etc.)
83-
* processor (turns raw inputs → EngineCoreRequests via validation, tokenization, and processing)
84-
* engine core client (in our running example we're using InprocClient which is basically == EngineCore; we'll gradually build up to DPLBAsyncMPClient which allows serving at scale)
85-
* output processor (converts raw EngineCoreOutputs → RequestOutput that the user sees)
83+
* processor (turns raw inputs → <code>EngineCoreRequests</code> via validation, tokenization, and processing)
84+
* engine core client (in our running example we're using <code>InprocClient</code> which is basically == <code>EngineCore</code>; we'll gradually build up to <code>DPLBAsyncMPClient</code> which allows serving at scale)
85+
* output processor (converts raw <code>EngineCoreOutputs</code><code>RequestOutput</code> that the user sees)
8686
> [!NOTE]
8787
> With the V0 engine being deprecated, class names and details may shift. I'll emphasize the core ideas rather than exact signatures. I'll abstract away some but not all of those details.
8888

0 commit comments

Comments
 (0)