diff --git a/docs/source/deployment/1_tensorrt_llm.rst b/docs/source/deployment/1_tensorrt_llm.rst index 5e1fb05c2..8dad4fb9d 100644 --- a/docs/source/deployment/1_tensorrt_llm.rst +++ b/docs/source/deployment/1_tensorrt_llm.rst @@ -2,12 +2,15 @@ TensorRT-LLM ========================== +**Deprecation Notice**: The export_tensorrt_llm_checkpoint API will be deprecated in future releases. Users are encouraged to transition to the :doc:`unified HF export API <3_unified_hf>`, which provides enhanced functionality and flexibility for exporting models to multiple inference frameworks including TensorRT-LLM, vLLM, and SGLang. + .. note:: - Please read the `TensorRT-LLM checkpoint workflow `_ + Please read the `TensorRT-LLM checkpoint workflow `_ first before going through this section. + ModelOpt toolkit supports automatic conversion of ModelOpt exported LLM to the TensorRT-LLM checkpoint and the engines for accelerated inferencing. This conversion is achieved by: @@ -144,4 +147,4 @@ If the :meth:`export_tensorrt_llm_checkpoint `_ to build and deploy the quantized LLM. +Once the TensorRT-LLM checkpoint is available, please follow the `TensorRT-LLM build API `_ to build and deploy the quantized LLM.