Skip to content

Commit d2b7b64

Browse files
authored
Add R1 perf data to latest news page (NVIDIA#2823)
* Update README.md Signed-off-by: Laikh Tewari <[email protected]> * add r1 perf chart to repo Signed-off-by: Laikh Tewari <[email protected]> * Delete docs/source/blogs/media/r1-perf.jpeg Signed-off-by: Laikh Tewari <[email protected]> * add file to correct media dir Signed-off-by: Laikh Tewari <[email protected]> * Update README.md with local img + remove old img Signed-off-by: Laikh Tewari <[email protected]> --------- Signed-off-by: Laikh Tewari <[email protected]>
1 parent ab5b19e commit d2b7b64

File tree

2 files changed

+7
-3
lines changed

2 files changed

+7
-3
lines changed

README.md

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -18,14 +18,18 @@ TensorRT-LLM
1818
<div align="left">
1919

2020
## Latest News
21+
* [2025/02/25] 🌟 DeepSeek-R1 performance now optimized for Blackwell [➡️ link](https://huggingface.co/nvidia/DeepSeek-R1-FP4)
22+
<div align="center">
23+
<img src="docs/source/media/r1-perf.jpeg" width="75%">
24+
25+
<sub><sup>HGX B200 (8 GPUs) vs HGX H200 (8 GPUs) vs 2 x HGX H100 (normalized to 8 GPUs for comparison). Input tokens not included in TPS calculations. TensorRT-LLM Version: 0.18.0.dev2025021800 (pre-release) used for Feb measurements, SGLang used for Jan measurements. Hopper numbers in FP8. B200 numbers in FP4. Max concurrency use case. ISL/OSL: 1K/1K.</sub></sup>
26+
<div align="left">
27+
2128
* [2025/01/07] 🌟 Getting Started with TensorRT-LLM
2229
[➡️ link](https://www.youtube.com/watch?v=TwWqPnuNHV8)
2330

2431
* [2025/01/04] ⚡Boost Llama 3.3 70B Inference Throughput 3x with NVIDIA TensorRT-LLM Speculative Decoding
2532
[➡️ link](https://developer.nvidia.com/blog/boost-llama-3-3-70b-inference-throughput-3x-with-nvidia-tensorrt-llm-speculative-decoding/)
26-
<div align="center">
27-
<img src="https://developer-blogs.nvidia.com/wp-content/uploads/2024/12/three-llamas-wearing-goggles.png" width="50%">
28-
<div align="left">
2933

3034
* [2024/12/10] ⚡ Llama 3.3 70B from AI at Meta is accelerated by TensorRT-LLM. 🌟 State-of-the-art model on par with Llama 3.1 405B for reasoning, math, instruction following and tool use. Explore the preview
3135
[➡️ link](https://build.nvidia.com/meta/llama-3_3-70b-instruct)

docs/source/media/r1-perf.jpeg

23.4 KB
Loading

0 commit comments

Comments
 (0)