Skip to content

Commit 612cfff

Browse files
authored
Merge branch 'main' into ampere_xqa_swa_1013
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
2 parents 5b8c434 + 1c7b7cd commit 612cfff

File tree

168 files changed

+6565
-1621
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

168 files changed

+6565
-1621
lines changed

ATTRIBUTIONS-Python.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -25486,7 +25486,7 @@ limitations under the License.
2548625486
```
2548725487

2548825488
### URLs
25489-
- `Homepage`: https://github.com/NVIDIA/TensorRT-Model-Optimizer
25489+
- `Homepage`: https://github.com/NVIDIA/Model-Optimizer
2549025490

2549125491

2549225492
## nvidia-modelopt-core (0.33.1)
@@ -25513,7 +25513,7 @@ limitations under the License.
2551325513
```
2551425514

2551525515
### URLs
25516-
- `Homepage`: https://github.com/NVIDIA/TensorRT-Model-Optimizer
25516+
- `Homepage`: https://github.com/NVIDIA/Model-Optimizer
2551725517

2551825518

2551925519
## nvidia-nccl-cu12 (2.27.3)

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -164,7 +164,7 @@ state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs.<
164164
[➡️ link](https://www.bentoml.com/blog/tuning-tensor-rt-llm-for-optimal-serving-with-bentoml)
165165

166166

167-
* [2024/08/20] 🏎️SDXL with #TensorRT Model Optimizer ⏱️⚡ 🏁 cache diffusion 🏁 quantization aware training 🏁 QLoRA 🏁 #Python 3.12
167+
* [2024/08/20] 🏎️SDXL with #Model Optimizer ⏱️⚡ 🏁 cache diffusion 🏁 quantization aware training 🏁 QLoRA 🏁 #Python 3.12
168168
[➡️ link](https://developer.nvidia.com/blog/nvidia-tensorrt-model-optimizer-v0-15-boosts-inference-performance-and-expands-model-support/)
169169

170170
* [2024/08/13] 🐍 DIY Code Completion with #Mamba ⚡ #TensorRT #LLM for speed 🤖 NIM for ease ☁️ deploy anywhere
@@ -209,7 +209,7 @@ Technical Deep Dive for serious coders ✅+99% compression ✅1 set of weights
209209
* [2024/05/21] ✨@modal_labs has the codes for serverless @AIatMeta Llama 3 on #TensorRT #LLM ✨👀 📚 Marvelous Modal Manual:
210210
Serverless TensorRT LLM (LLaMA 3 8B) | Modal Docs [➡️ link](https://modal.com/docs/examples/trtllm_llama)
211211

212-
* [2024/05/08] NVIDIA TensorRT Model Optimizer -- the newest member of the #TensorRT ecosystem is a library of post-training and training-in-the-loop model optimization techniques ✅quantization ✅sparsity ✅QAT [➡️ blog](https://developer.nvidia.com/blog/accelerate-generative-ai-inference-performance-with-nvidia-tensorrt-model-optimizer-now-publicly-available/)
212+
* [2024/05/08] NVIDIA Model Optimizer -- the newest member of the #TensorRT ecosystem is a library of post-training and training-in-the-loop model optimization techniques ✅quantization ✅sparsity ✅QAT [➡️ blog](https://developer.nvidia.com/blog/accelerate-generative-ai-inference-performance-with-nvidia-tensorrt-model-optimizer-now-publicly-available/)
213213

214214
* [2024/05/07] 🦙🦙🦙 24,000 tokens per second 🛫Meta Llama 3 takes off with #TensorRT #LLM 📚[➡️ link](https://blogs.nvidia.com/blog/meta-llama3-inference-acceleration/)
215215

cpp/tensorrt_llm/common/customAllReduceUtils.h

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -81,7 +81,6 @@ inline AllReduceStrategyType SelectStrategyLP(size_t seq_len, size_t hidden_size
8181
{
8282
return AllReduceStrategyType::ONESHOT;
8383
}
84-
return AllReduceStrategyType::NCCL;
8584
}
8685

8786
// use 1D vector to store the best strategy instead of a map for each sm version
@@ -143,15 +142,15 @@ inline AllReduceStrategyType selectStrategyLookUpTable(
143142
sm_version = 100;
144143
}
145144

146-
// Check if the entry is out of bounds, otherwise return NCCL as fallback
145+
// Check if the entry is out of bounds, otherwise return NCCL_SYMMETRIC as fallback
147146
if (AllReduceBestStrategyTable.find(sm_version) == AllReduceBestStrategyTable.end()
148147
|| tp_index >= AllReduceBestStrategyTable.at(sm_version).size()
149148
|| fusion_op_index >= AllReduceBestStrategyTable.at(sm_version).at(tp_index).size()
150149
|| hidden_size_index >= AllReduceBestStrategyTable.at(sm_version).at(tp_index).at(fusion_op_index).size()
151150
|| num_token_index
152151
>= AllReduceBestStrategyTable.at(sm_version).at(tp_index).at(fusion_op_index).at(hidden_size_index).size())
153152
{
154-
return AllReduceStrategyType::NCCL;
153+
return AllReduceStrategyType::NCCL_SYMMETRIC;
155154
}
156155

157156
return static_cast<AllReduceStrategyType>(

0 commit comments

Comments
 (0)