Skip to content

Commit de3c750

Browse files
committed
[chore] Mark TRTLLMSampler as deprecated and update documentation
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
1 parent 5b0c956 commit de3c750

File tree

4 files changed

+12
-34
lines changed

4 files changed

+12
-34
lines changed

docs/source/features/sampling.md

Lines changed: 4 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -16,30 +16,6 @@ The PyTorch backend supports a wide variety of features, listed below:
1616

1717
## General usage
1818

19-
There are two sampling backends available.
20-
21-
* Torch Sampler
22-
* TRTLLM Sampler
23-
24-
Torch Sampler currently supports a superset of features of TRTLLM Sampler, and is intended as the long-term solution. One can specify which sampler to use explicitly with:
25-
26-
```python
27-
from tensorrt_llm import LLM
28-
29-
# Chooses TorchSampler explicitly
30-
llm = LLM(model='nvidia/Llama-3.1-8B-Instruct-FP8',
31-
sampler_type="TorchSampler")
32-
33-
# Chooses TRTLLMSampler explicitly
34-
llm = LLM(model='nvidia/Llama-3.1-8B-Instruct-FP8',
35-
sampler_type="TRTLLMSampler")
36-
```
37-
38-
By default, the sampling backend is chosen to be `auto`. This will use:
39-
40-
* TRTLLM Sampler when using Beam Search.
41-
* Torch Sampler otherwise.
42-
4319
Here is an example to run a model with basic usage of sampling parameters. This example prepares two identical prompts which will give different results due to the sampling parameters chosen:
4420

4521
```python
@@ -73,7 +49,7 @@ llm.generate(["Hello, my name is",
7349
sampling_params_1])
7450
```
7551

76-
### LLM API sampling behavior when using Torch Sampler
52+
### LLM API sampling behavior
7753

7854
* The sampling is controlled via `SamplingParams`.
7955

@@ -105,17 +81,17 @@ llm.generate(["Hello, my name is",
10581

10682
### Performance
10783

108-
The Torch Sampler leverages the optimized sampling kernels provided by
84+
The sampler leverages the optimized sampling kernels provided by
10985
[FlashInfer](https://docs.flashinfer.ai/api/sampling.html). The sampler
11086
also uses the [sorting-free implementations](https://flashinfer.ai/2025/03/10/sampling.html)
11187
whenever possible. This optimization does not compute the complete set of token sampling probabilities
11288
(after top-k / top-p masking etc.), which typically can be omitted unless requested by the user or
11389
required for speculative decoding (rejection sampling).
114-
In case of unexpected problems, the use of FlashInfer in Torch Sampler can
90+
In case of unexpected problems, the use of FlashInfer in the sampler can
11591
be disabled via the `disable_flashinfer_sampling` config option (note that this option is likely
11692
to be removed in a future TensorRT LLM release).
11793

118-
Moreover, Torch Sampler internally batches requests with compatible sampling parameters. This
94+
Moreover, the sampler internally batches requests with compatible sampling parameters. This
11995
can greatly reduce the overall latency of the sampling step when request batches are comprised
12096
of requests with very heterogeneous sampling strategies (e.g. a mix of requests using greedy and top-p-after-top-k sampling).
12197

tensorrt_llm/_torch/pyexecutor/_util.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1278,9 +1278,10 @@ def instantiate_sampler(
12781278
if mm_encoder_only:
12791279
# NOTE: handle model outputs specially for mm encoder executor/engine
12801280
return EarlyStopWithMMResult()
1281-
if llm_args.sampler_type == SamplerType.TRTLLMSampler or (
1282-
llm_args.sampler_type == SamplerType.auto
1283-
and decoding_mode.isBeamSearch()):
1281+
if llm_args.sampler_type == SamplerType.TRTLLMSampler:
1282+
logger.warning(
1283+
"TRTLLMSampler is deprecated and will be removed in release 1.4. Please use TorchSampler instead."
1284+
)
12841285
logger.debug(f"DecodingMode: {decoding_mode.name}")
12851286
return TRTLLMSampler(engine.model,
12861287
engine.dtype,

tensorrt_llm/llmapi/llm_args.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3014,8 +3014,9 @@ class TorchLlmArgs(BaseLlmArgs):
30143014
sampler_type: Union[str, SamplerType] = Field(
30153015
default=SamplerType.auto,
30163016
description=
3017-
"The type of sampler to use. Options are TRTLLMSampler, TorchSampler or auto. Defaults to auto, which will use TorchSampler unless BeamSearch is requested.",
3018-
status="beta")
3017+
"The type of sampler to use. Options are TRTLLMSampler, TorchSampler or auto. Defaults to auto, which will use TorchSampler. "
3018+
"TRTLLMSampler is deprecated and will be removed in release 1.4.",
3019+
status="deprecated")
30193020

30203021
sampler_force_async_worker: bool = Field(
30213022
default=False,

tests/unittest/api_stability/references/llm.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -122,7 +122,7 @@ methods:
122122
sampler_type:
123123
annotation: Union[str, tensorrt_llm.llmapi.llm_args.SamplerType]
124124
default: auto
125-
status: beta
125+
status: deprecated
126126
sampler_force_async_worker:
127127
annotation: bool
128128
default: False

0 commit comments

Comments
 (0)