Skip to content

Commit 289b18e

Browse files
authored
[Docs] Update features/disagg_prefill, add v1 examples and development (#22165)
Signed-off-by: David Chen <[email protected]>
1 parent 35171b1 commit 289b18e

File tree

3 files changed

+25
-0
lines changed

3 files changed

+25
-0
lines changed
90.6 KB
Loading
87.9 KB
Loading

docs/features/disagg_prefill.md

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,18 @@ Two main reasons:
1919

2020
Please refer to <gh-file:examples/online_serving/disaggregated_prefill.sh> for the example usage of disaggregated prefilling.
2121

22+
Now supports 5 types of connectors:
23+
24+
- **SharedStorageConnector**: refer to <gh-file:examples/offline_inference/disaggregated-prefill-v1/run.sh> for the example usage of SharedStorageConnector disaggregated prefilling.
25+
- **LMCacheConnectorV1**: refer to <gh-file:examples/others/lmcache/disagg_prefill_lmcache_v1/disagg_example_nixl.sh> for the example usage of LMCacheConnectorV1 disaggregated prefilling which uses NIXL as the underlying KV transmission.
26+
- **NixlConnector**: refer to <gh-file:tests/v1/kv_connector/nixl_integration/run_accuracy_test.sh> for the example usage of NixlConnector disaggregated prefilling which support fully async send/recv.
27+
- **P2pNcclConnector**: refer to <gh-file:examples/online_serving/disaggregated_serving_p2p_nccl_xpyd/disagg_example_p2p_nccl_xpyd.sh> for the example usage of P2pNcclConnector disaggregated prefilling.
28+
- **MultiConnector**: take advantage of the kv_connector_extra_config: dict[str, Any] already present in KVTransferConfig to stash all the connectors we want in an ordered list of kwargs.such as:
29+
30+
```bash
31+
--kv-transfer-config '{"kv_connector":"MultiConnector","kv_role":"kv_both","kv_connector_extra_config":{"connectors":[{"kv_connector":"NixlConnector","kv_role":"kv_both"},{"kv_connector":"SharedStorageConnector","kv_role":"kv_both","kv_connector_extra_config":{"shared_storage_path":"local_storage"}}]}}'
32+
```
33+
2234
## Benchmarks
2335

2436
Please refer to <gh-file:benchmarks/disagg_benchmarks> for disaggregated prefilling benchmarks.
@@ -48,6 +60,19 @@ The workflow of disaggregated prefilling is as follows:
4860

4961
The `buffer` corresponds to `insert` API in LookupBuffer, and the `drop_select` corresponds to `drop_select` API in LookupBuffer.
5062

63+
Now every process in vLLM will have a corresponding connector. Specifically, we have:
64+
65+
- Scheduler connector: the connector that locates in the same process as the scheduler process. It schedules the KV cache transfer ops.
66+
- Worker connectors: the connectors that locate in the worker processes. They execute KV cache transfer ops.
67+
68+
Here is a figure illustrating how the above 2 connectors are organized:
69+
70+
![Disaggregated prefilling high level design](../assets/features/disagg_prefill/high_level_design.png)
71+
72+
The figure below shows how the worker connector works with the attention module to achieve layer-by-layer KV cache store and load:
73+
74+
![Disaggregated prefilling workflow](../assets/features/disagg_prefill/workflow.png)
75+
5176
## Third-party contributions
5277

5378
Disaggregated prefilling is highly related to infrastructure, so vLLM relies on third-party connectors for production-level disaggregated prefilling (and vLLM team will actively review and merge new PRs for third-party connectors).

0 commit comments

Comments
 (0)