Skip to content

Commit 9bd42ec

Browse files
authored
[TRTLLM-5208][BREAKING CHANGE] chore: make pytorch LLM the default (#5312)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
1 parent 113f6fb commit 9bd42ec

File tree

89 files changed

+320
-251
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

89 files changed

+320
-251
lines changed

docs/source/torch.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ The PyTorch backend of TensorRT-LLM is available in version 0.17 and later. You
1111

1212
## Quick Start
1313

14-
Here is a simple example to show how to use `tensorrt_llm._torch.LLM` API with Llama model.
14+
Here is a simple example to show how to use `tensorrt_llm.LLM` API with Llama model.
1515

1616
```{literalinclude} ../../examples/pytorch/quickstart.py
1717
:language: python
@@ -24,7 +24,7 @@ The PyTorch backend supports FP8 and NVFP4 quantization. You can pass quantized
2424
which are generated by [TensorRT Model Optimizer](https://github.com/NVIDIA/TensorRT-Model-Optimizer).
2525

2626
```python
27-
from tensorrt_llm._torch import LLM
27+
from tensorrt_llm import LLM
2828
llm = LLM(model='nvidia/Llama-3.1-8B-Instruct-FP8')
2929
llm.generate("Hello, my name is")
3030
```
@@ -44,7 +44,7 @@ The PyTorch backend supports most of the sampling features that are supported on
4444
In order to use this feature, it is necessary to enable option `enable_trtllm_sampler` in the `LLM` class, and pass a `SamplingParams` object with the desired options as well. The following example prepares two identical prompts which will give different results due to the sampling parameters chosen:
4545

4646
```python
47-
from tensorrt_llm._torch import LLM
47+
from tensorrt_llm import LLM
4848
llm = LLM(model='nvidia/Llama-3.1-8B-Instruct-FP8',
4949
enable_trtllm_sampler=True)
5050
sampling_params = SamplingParams(

docs/source/torch/adding_new_model.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -186,7 +186,7 @@ __all__ = [
186186
Alternatively, you can register the new model as an out-of-tree model, so that you can use the new model without touching the TensorRT-LLM codebase. To do so, place `modeling_mymodel.py` (and potentially `configuration_mymodel.py`) in your working directory, and import the modeling code in your script:
187187

188188
```python
189-
from tensorrt_llm._torch import LLM
189+
from tensorrt_llm import LLM
190190
import modeling_mymodel
191191

192192
def main():

docs/source/torch/arch_overview.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,10 +5,10 @@ Besides TensorRT, PyTorch can also serve as the backend for TensorRT-LLM. This d
55

66
## Top Level API
77

8-
The interface for PyTorch backend is `tensorrt._torch.LLM`.
8+
The interface for PyTorch backend is `tensorrt_llm.LLM`.
99

1010
```python
11-
from tensorrt_llm._torch import LLM
11+
from tensorrt_llm import LLM
1212
llm = LLM(model=<path_to_llama_from_hf>)
1313
```
1414

examples/apps/chat.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,8 @@
55
import colorama
66
from transformers import AutoTokenizer, PreTrainedTokenizer
77

8-
from tensorrt_llm.llmapi import LLM, BuildConfig, KvCacheConfig, SamplingParams
8+
from tensorrt_llm._tensorrt_engine import LLM
9+
from tensorrt_llm.llmapi import BuildConfig, KvCacheConfig, SamplingParams
910

1011

1112
class LlmConsole(code.InteractiveConsole):

examples/apps/fastapi_server.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,8 +18,9 @@
1818
from fastapi import FastAPI, Request
1919
from fastapi.responses import JSONResponse, Response, StreamingResponse
2020

21+
from tensorrt_llm._tensorrt_engine import LLM
2122
from tensorrt_llm.executor import CppExecutorError, RequestError
22-
from tensorrt_llm.llmapi import LLM, BuildConfig, KvCacheConfig, SamplingParams
23+
from tensorrt_llm.llmapi import BuildConfig, KvCacheConfig, SamplingParams
2324

2425
TIMEOUT_KEEP_ALIVE = 5 # seconds.
2526

examples/auto_deploy/build_and_run_ad.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,11 +7,12 @@
77
import torch
88
from simple_config import SimpleConfig
99

10+
from tensorrt_llm._tensorrt_engine import LLM
1011
from tensorrt_llm._torch.auto_deploy.models import ModelFactoryRegistry
1112
from tensorrt_llm._torch.auto_deploy.shim import DemoLLM
1213
from tensorrt_llm._torch.auto_deploy.utils.benchmark import benchmark, store_benchmark_results
1314
from tensorrt_llm._torch.auto_deploy.utils.logger import ad_logger
14-
from tensorrt_llm.llmapi.llm import LLM, RequestOutput
15+
from tensorrt_llm.llmapi.llm import RequestOutput
1516
from tensorrt_llm.llmapi.llm_args import TorchCompileConfig
1617
from tensorrt_llm.sampling_params import SamplingParams
1718

examples/llm-api/llm_auto_parallel.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
### Automatic Parallelism with LLM
2-
from tensorrt_llm import LLM, SamplingParams
2+
from tensorrt_llm import SamplingParams
3+
from tensorrt_llm._tensorrt_engine import LLM
34

45

56
def main():

examples/llm-api/llm_eagle2_decoding.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
### Generate Text Using Eagle2 Decoding
22

3-
from tensorrt_llm import LLM, SamplingParams
4-
from tensorrt_llm.llmapi import (LLM, EagleDecodingConfig, KvCacheConfig,
3+
from tensorrt_llm._tensorrt_engine import LLM
4+
from tensorrt_llm.llmapi import (EagleDecodingConfig, KvCacheConfig,
55
SamplingParams)
66

77

examples/llm-api/llm_eagle_decoding.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
### Generate Text Using Eagle Decoding
22

3-
from tensorrt_llm import LLM, SamplingParams
4-
from tensorrt_llm.llmapi import (LLM, EagleDecodingConfig, KvCacheConfig,
5-
SamplingParams)
3+
from tensorrt_llm import SamplingParams
4+
from tensorrt_llm._tensorrt_engine import LLM
5+
from tensorrt_llm.llmapi import EagleDecodingConfig, KvCacheConfig
66

77

88
def main():

examples/llm-api/llm_guided_decoding.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
### Generate text with guided decoding
2-
from tensorrt_llm import LLM, SamplingParams
2+
from tensorrt_llm import SamplingParams
3+
from tensorrt_llm._tensorrt_engine import LLM
34
from tensorrt_llm.llmapi import GuidedDecodingParams
45

56

0 commit comments

Comments
 (0)