Skip to content

Commit 65b793c

Browse files
authored
[None][doc] Add the missing content for model support section and fix valid links for long_sequence.md (#8869)
Signed-off-by: nv-guomingz <[email protected]>
1 parent 271a981 commit 65b793c

File tree

2 files changed

+7
-2
lines changed

2 files changed

+7
-2
lines changed

docs/source/features/long-sequence.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ Note that if chunked context is enabled, please set the `max_num_tokens` to be a
2626

2727
<div align="center">
2828
<figure>
29-
<img src="https://github.com/NVIDIA/TensorRT-LLM/raw/main/docs/source/blogs/media/feat_long_seq_chunked_attention.png" alt="feat_long_seq_chunked_attention" width="240" height="auto">
29+
<img src="https://github.com/NVIDIA/TensorRT-LLM/raw/main/docs/source/media/feat_long_seq_chunked_attention.png" alt="feat_long_seq_chunked_attention" width="240" height="auto">
3030
</figure>
3131
</div>
3232
<p align="center"><sub><em>Figure 1. Illustration of chunked attention </em></sub></p>
@@ -43,7 +43,7 @@ Note that chunked attention can only be applied to context requests.
4343

4444
<div align="center">
4545
<figure>
46-
<img src="https://github.com/NVIDIA/TensorRT-LLM/raw/main/docs/source/blogs/media/feat_long_seq_chunked_attention.png" alt="feat_long_seq_sliding_win_attn" width="240" height="auto">
46+
<img src="https://github.com/NVIDIA/TensorRT-LLM/raw/main/docs/source/media/feat_long_seq_chunked_attention.png" alt="feat_long_seq_sliding_win_attn" width="240" height="auto">
4747
</figure>
4848
</div>
4949
<p align="center"><sub><em>Figure 2. Illustration of sliding window attention </em></sub></p>

docs/source/overview.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,11 @@ TensorRT LLM delivers breakthrough performance on the latest NVIDIA GPUs:
2525

2626
TensorRT LLM supports the latest and most popular LLM architectures:
2727

28+
- **Language Models**: GPT-OSS, Deepseek-R1/V3, Llama 3/4, Qwen2/3, Gemma 3, Phi 4...
29+
- **Multi-modal Models**: LLaVA-NeXT, Qwen2-VL, VILA, Llama 3.2 Vision...
30+
31+
TensorRT LLM strives to support the most popular models on **Day 0**.
32+
2833
### FP4 Support
2934
[NVIDIA B200 GPUs](https://www.nvidia.com/en-us/data-center/dgx-b200/) , when used with TensorRT LLM, enable seamless loading of model weights in the new [FP4 format](https://developer.nvidia.com/blog/introducing-nvfp4-for-efficient-and-accurate-low-precision-inference/#what_is_nvfp4), allowing you to automatically leverage optimized FP4 kernels for efficient and accurate low-precision inference.
3035

0 commit comments

Comments
 (0)