Skip to content

Commit e33024c

Browse files
bo-nvdominicshanshan
authored andcommitted
[TRTLLM-7964][infra] Set nixl to default cache transceiver backend (NVIDIA#7926)
Signed-off-by: Bo Deng <deemod@nvidia.com>
1 parent 4a49ff5 commit e33024c

File tree

7 files changed

+12
-15
lines changed

7 files changed

+12
-15
lines changed

cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -89,7 +89,7 @@ std::unique_ptr<BaseCacheTransceiver> CacheTransceiverFactory::createCacheTransc
8989
}
9090
else
9191
{
92-
backendType = executor::CacheTransceiverConfig::BackendType::UCX;
92+
backendType = executor::CacheTransceiverConfig::BackendType::NIXL;
9393
}
9494
}
9595
cacheTransceiverConfig.value().setBackendType(backendType);

docker/Dockerfile.multi

Lines changed: 0 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -93,15 +93,6 @@ COPY docker/common/install_triton.sh \
9393

9494
RUN bash ./install_triton.sh && rm install_triton.sh
9595

96-
# Install UCX first
97-
RUN bash ./install_ucx.sh && rm install_ucx.sh
98-
99-
# Install NIXL
100-
RUN bash ./install_nixl.sh && rm install_nixl.sh
101-
102-
# Install etcd
103-
RUN bash ./install_etcd.sh && rm install_etcd.sh
104-
10596
FROM ${DEVEL_IMAGE} AS wheel
10697
WORKDIR /src/tensorrt_llm
10798
COPY benchmarks benchmarks

docs/source/features/disagg-serving.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -106,7 +106,7 @@ cache_transceiver_config:
106106
max_tokens_in_buffer: <int>
107107
```
108108
109-
`backend` specifies the communication backend for transferring the kvCache, valid options include `DEFAULT`,`UCX`, `NIXL`, and `MPI`, the default backend is UCX.
109+
`backend` specifies the communication backend for transferring the kvCache, valid options include `DEFAULT`,`UCX`, `NIXL`, and `MPI`, the default backend is NIXL.
110110

111111
`max_tokens_in_buffer` defines the buffer size for kvCache transfers, it is recommended to set this value greater than or equal to the maximum ISL (Input Sequence Length) of all requests for optimal performance.
112112

docs/source/installation/linux.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,9 @@
1717
pip3 install torch==2.9.0 torchvision --index-url https://download.pytorch.org/whl/cu130
1818

1919
sudo apt-get -y install libopenmpi-dev
20+
21+
# Optional step: Only required for disagg-serving
22+
sudo apt-get -y install libzmq3-dev
2023
```
2124

2225
```{tip}

examples/disaggregated/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ The `trtllm-serve` command supports the `extra-llm-config.yaml` parameter. In th
1212

1313
```yaml
1414
cache_transceiver_config:
15-
# KV cache transmission backend. Valid options include `DEFAULT` (i.e., UCX), `UCX`, `NIXL`.
15+
# KV cache transmission backend. Valid options include `DEFAULT` (i.e., NIXL), `UCX`, `NIXL`.
1616
backend: <str>
1717
# KV cache buffer size. Set it ≥ the maximum ISL (Input Sequence Length) for best performance.
1818
max_tokens_in_buffer: <int>

tensorrt_llm/_torch/pyexecutor/kv_cache_transceiver.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -38,10 +38,10 @@ def create_kv_cache_transceiver(
3838

3939
if cache_transceiver_config.backend == BackendTypeCpp.DEFAULT:
4040
# When cache_transceiver_config.backend is not set, fallback to env_vars settings
41-
# UCX is the default backend
42-
cache_transceiver_config.backend = BackendTypeCpp.UCX
41+
# NIXL is the default backend
42+
cache_transceiver_config.backend = BackendTypeCpp.NIXL
4343
# Ordered by priority
44-
env_vars = [("TRTLLM_USE_NIXL_KVCACHE", BackendTypeCpp.NIXL),
44+
env_vars = [("TRTLLM_USE_UCX_KVCACHE", BackendTypeCpp.UCX),
4545
("TRTLLM_USE_MPI_KVCACHE", BackendTypeCpp.MPI)]
4646
for env_var, be_type in env_vars:
4747
if getenv(env_var) == "1":

tests/integration/defs/disaggregated/test_disaggregated.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -430,6 +430,9 @@ def run_disaggregated_test(example_dir,
430430
config_file
431431
]
432432
else:
433+
pytest.skip(
434+
"https://nvbugs/5584607 Ray orchestrator is not supported with NIXL(DEFAULT) cache transceiver backend."
435+
)
433436
with open(config_file, 'r') as f:
434437
config = yaml.safe_load(f)
435438

0 commit comments

Comments
 (0)