[TRTLLM-7964][infra] set nixl to default cache transceiver backend

bo-nv · bo-nv · commit 6cc619a82ef1 · 2025-09-25T20:19:34.000+08:00
Signed-off-by: Bo Deng &lt;deemod@nvidia.com&gt;
diff --git a/cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp b/cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp
@@ -89,7 +89,7 @@ std::unique_ptr<BaseCacheTransceiver> CacheTransceiverFactory::createCacheTransc
         }
         else
         {
-            backendType = executor::CacheTransceiverConfig::BackendType::UCX;
+            backendType = executor::CacheTransceiverConfig::BackendType::NIXL;
         }
     }
     cacheTransceiverConfig.value().setBackendType(backendType);
diff --git a/docker/common/install_nixl.sh b/docker/common/install_nixl.sh
@@ -40,5 +40,3 @@ cd builddir && ninja install
 cd ../..
 rm -rf nixl*  # Remove NIXL source tree to save space
 export LD_LIBRARY_PATH=$OLD_LD_LIBRARY_PATH
-
-echo "export LD_LIBRARY_PATH=/opt/nvidia/nvda_nixl/lib/${ARCH_NAME}:/opt/nvidia/nvda_nixl/lib64:\$LD_LIBRARY_PATH" >> "${ENV}"
diff --git a/docker/common/install_ucx.sh b/docker/common/install_ucx.sh
@@ -26,4 +26,3 @@ cd ucx
 make install -j$(nproc)
 cd ..
 rm -rf ucx  # Remove UCX source to save space
-echo "export LD_LIBRARY_PATH=${UCX_INSTALL_PATH}/lib:\$LD_LIBRARY_PATH" >> "${ENV}"
diff --git a/docs/source/features/auto_deploy/auto-deploy.md b/docs/source/features/auto_deploy/auto-deploy.md
@@ -28,7 +28,7 @@ AutoDeploy provides an alternative method for deploying models using the LLM API
 AutoDeploy is included with the TRT-LLM installation.
 
 ```bash
-sudo apt-get -y install libopenmpi-dev && pip3 install --upgrade pip setuptools && pip3 install tensorrt_llm
+sudo apt-get -y install libopenmpi-dev libzmq3-dev && pip3 install --upgrade pip setuptools && pip3 install tensorrt_llm
 ```
 
 You can refer to [TRT-LLM installation guide](../../installation/linux.md) for more information.
diff --git a/docs/source/features/disagg-serving.md b/docs/source/features/disagg-serving.md
@@ -106,7 +106,7 @@ cache_transceiver_config:
   max_tokens_in_buffer: <int>
 ```
 
-`backend` specifies the communication backend for transferring the kvCache, valid options include `DEFAULT`,`UCX`, `NIXL`, and `MPI`, the default backend is UCX.
+`backend` specifies the communication backend for transferring the kvCache, valid options include `DEFAULT`,`UCX`, `NIXL`, and `MPI`, the default backend is NIXL.
 
 `max_tokens_in_buffer` defines the buffer size for kvCache transfers, it is recommended to set this value greater than or equal to the maximum ISL (Input Sequence Length) of all requests for optimal performance.
 
diff --git a/docs/source/installation/linux.md b/docs/source/installation/linux.md
@@ -16,7 +16,7 @@
    # Optional step: Only required for NVIDIA Blackwell GPUs and SBSA platform
    pip3 install torch==2.7.1 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
 
-   sudo apt-get -y install libopenmpi-dev
+   sudo apt-get -y install libopenmpi-dev libzmq3-dev
    ```
 
    PyTorch CUDA 12.8 package is required for supporting NVIDIA Blackwell GPUs and SBSA platform. On prior GPUs or Linux x86_64 platform, this extra installation is not required.
diff --git a/docs/source/torch/auto_deploy/auto-deploy.md b/docs/source/torch/auto_deploy/auto-deploy.md
@@ -28,7 +28,7 @@ AutoDeploy provides an alternative method for deploying models using the LLM API
 AutoDeploy is included with the TRT-LLM installation.
 
 ```bash
-sudo apt-get -y install libopenmpi-dev && pip3 install --upgrade pip setuptools && pip3 install tensorrt_llm
+sudo apt-get -y install libopenmpi-dev libzmq3-dev && pip3 install --upgrade pip setuptools && pip3 install tensorrt_llm
 ```
 
 You can refer to [TRT-LLM installation guide](../../installation/linux.md) for more information.
diff --git a/examples/auto_deploy/README.md b/examples/auto_deploy/README.md
@@ -9,7 +9,7 @@ ______________________________________________________________________
 AutoDeploy is included with the TRT-LLM installation.
 
 ```bash
-sudo apt-get -y install libopenmpi-dev && pip3 install --upgrade pip setuptools && pip3 install tensorrt_llm
+sudo apt-get -y install libopenmpi-dev libzmq3-dev && pip3 install --upgrade pip setuptools && pip3 install tensorrt_llm
 ```
 
 You can refer to [TRT-LLM installation guide](https://github.com/NVIDIA/TensorRT-LLM/blob/main/docs/source/installation/linux.md) for more information.
diff --git a/examples/disaggregated/README.md b/examples/disaggregated/README.md
@@ -12,7 +12,7 @@ The `trtllm-serve` command supports the `extra-llm-config.yaml` parameter. In th
 
 ```yaml
 cache_transceiver_config:
-  # KV cache transmission backend. Valid options include `DEFAULT` (i.e., UCX), `UCX`, `NIXL`.
+  # KV cache transmission backend. Valid options include `DEFAULT` (i.e., NIXL), `UCX`, `NIXL`.
   backend: <str>
   # KV cache buffer size. Set it ≥ the maximum ISL (Input Sequence Length) for best performance.
   max_tokens_in_buffer: <int>
diff --git a/tensorrt_llm/_torch/pyexecutor/kv_cache_transceiver.py b/tensorrt_llm/_torch/pyexecutor/kv_cache_transceiver.py
@@ -38,10 +38,10 @@ def create_kv_cache_transceiver(
 
     if cache_transceiver_config.backend == BackendTypeCpp.DEFAULT:
         # When cache_transceiver_config.backend is not set, fallback to env_vars settings
-        # UCX is the default backend
-        cache_transceiver_config.backend = BackendTypeCpp.UCX
+        # NIXL is the default backend
+        cache_transceiver_config.backend = BackendTypeCpp.NIXL
         # Ordered by priority
-        env_vars = [("TRTLLM_USE_NIXL_KVCACHE", BackendTypeCpp.NIXL),
+        env_vars = [("TRTLLM_USE_UCX_KVCACHE", BackendTypeCpp.UCX),
                     ("TRTLLM_USE_MPI_KVCACHE", BackendTypeCpp.MPI)]
         for env_var, be_type in env_vars:
             if getenv(env_var) == "1":

Original file line number	Diff line number	Diff line change
`@@ -89,7 +89,7 @@ std::unique_ptr<BaseCacheTransceiver> CacheTransceiverFactory::createCacheTransc`
`89`	`89`	`}`
`90`	`90`	`else`
`91`	`91`	`{`
`92`		`- backendType = executor::CacheTransceiverConfig::BackendType::UCX;`
	`92`	`+ backendType = executor::CacheTransceiverConfig::BackendType::NIXL;`
`93`	`93`	`}`
`94`	`94`	`}`
`95`	`95`	`cacheTransceiverConfig.value().setBackendType(backendType);`