Skip to content

Commit 302fa1d

Browse files
lucaslievideodanchik
authored andcommitted
[None][doc] promote AutoDeploy to beta feature in docs (NVIDIA#10372)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com> Signed-off-by: Daniil Kulko <kulkodaniil@gmail.com>
1 parent 143205a commit 302fa1d

File tree

4 files changed

+15
-5
lines changed

4 files changed

+15
-5
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,7 @@ docs/source/**/*.rst
6363
.coverage.*
6464
results_trt/
6565
llm-test-workspace/
66+
ad-test-workspace/
6667

6768
# build/debug
6869
*.safetensors

docs/source/features/auto_deploy/auto-deploy.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,13 @@
1-
# AutoDeploy (Prototype)
1+
# AutoDeploy (Beta)
22

33
```{note}
4-
This project is under active development and is currently in a prototype stage. The code is a prototype, subject to change, and may include backward-incompatible updates. While we strive for correctness, there are no guarantees regarding functionality, stability, or reliability.
4+
This project is under active development and is currently released as beta feature. The code is
5+
subject to change, and may include backward-incompatible updates.
56
```
67

78
## Seamless Model Deployment from PyTorch to TensorRT LLM
89

9-
AutoDeploy is a prototype designed to simplify and accelerate the deployment of PyTorch models, including off-the-shelf models such as those from the Hugging Face Transformers library, to TensorRT LLM.
10+
AutoDeploy is designed to simplify and accelerate the deployment of PyTorch models, including off-the-shelf models such as those from the Hugging Face Transformers library, to TensorRT LLM.
1011

1112
![AutoDeploy overview](../../media/ad_overview.png)
1213
<sub><em>AutoDeploy overview and relation with TensorRT LLM's LLM API</em></sub>

examples/auto_deploy/README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -334,4 +334,5 @@ the current progress in AutoDeploy and where you can help.
334334

335335
## Disclaimer
336336

337-
This project is under active development and is currently in a prototype stage. The code is experimental, subject to change, and may include backward-incompatible updates. While we strive for correctness, there are no guarantees regarding functionality, stability, or reliability.
337+
This project is under active development and is currently released as beta feature. The code is
338+
subject to change, and may include backward-incompatible updates.

examples/auto_deploy/nemotron_flash.yaml

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,14 @@ max_num_tokens: 8192
55
enable_chunked_prefill: true
66
model_factory: NemotronFlashForCausalLM
77
free_mem_ratio: 0.9
8-
cuda_graph_batch_sizes: [1, 2, 4, 8, 16, 24, 32, 64,96, 128, 256, 320, 384]
8+
cuda_graph_batch_sizes: [1, 2, 4, 8, 16, 24, 32, 64, 96, 128, 256, 320, 384]
99
kv_cache_config:
1010
# disable kv_cache reuse since not supported for hybrid/ssm models
1111
enable_block_reuse: false
12+
transforms:
13+
gather_logits_before_lm_head:
14+
# TODO: fix https://github.com/NVIDIA/TensorRT-LLM/issues/9878 to enable by default
15+
enabled: true
16+
fuse_mamba_a_log:
17+
stage: post_load_fusion
18+
enabled: true

0 commit comments

Comments
 (0)