Update Nemotron Super and Ultra in Supported Models and add an example (NVIDIA#3632)

Naveassaf · web-flow · commit f7c2eb4fa2dc · 2025-04-20T21:14:33.000+08:00
* Update Nemotron Super and Ultra in Supported Models and add an example

Signed-off-by: Nave Assaf &lt;nassaf@nvidia.com&gt;

* Update README link to match new examples structure

Signed-off-by: Nave Assaf &lt;nassaf@nvidia.com&gt;

---------

Signed-off-by: Nave Assaf &lt;nassaf@nvidia.com&gt;
diff --git a/examples/models/core/nemotron_nas/README.md b/examples/models/core/nemotron_nas/README.md
@@ -16,6 +16,9 @@ The TensorRT-LLM Nemotron-NAS implementation can be found in [tensorrt_llm/model
 
 * [`convert_checkpoint.py`](./convert_checkpoint.py) to convert the model into tensorrt-llm checkpoint format.
 
+The recommended flow for using Nemotron-NAS models is through TRTLLM's PyTorch-based flow.
+An example of how to run `Nemotron-NAS` models through the PyTorch workflow can be found in the [PyTorch quickstart example](../../../pytorch/README.md).
+
 ## Support Matrix
 
   * FP16
diff --git a/examples/pytorch/README.md b/examples/pytorch/README.md
@@ -23,6 +23,9 @@ python3 quickstart_advanced.py --model_dir nvidia/Llama-3.1-8B-Instruct-FP8 --tp
 
 # FP8(e4m3) kvcache
 python3 quickstart_advanced.py --model_dir nvidia/Llama-3.1-8B-Instruct-FP8 --kv_cache_dtype fp8
+
+# BF16 + TP=8
+python3 quickstart_advanced.py --model_dir nvidia/Llama-3_1-Nemotron-Ultra-253B-v1 --tp_size 8
 ```
 
 Run the multimodal example script:
@@ -42,7 +45,6 @@ python3 quickstart_multimodal.py --model_dir Efficient-Large-Model/NVILA-8B --mo
 | Architecture | Model | HuggingFace Example | Modality |
 |--------------|-------|---------------------|----------|
 | `BertForSequenceClassification` | BERT-based | `textattack/bert-base-uncased-yelp-polarity` | L |
-| `DeciLMForCausalLM` | Nemotron | `nvidia/Llama-3_1-Nemotron-51B-Instruct` | L |
 | `DeepseekV3ForCausalLM` | DeepSeek-V3 | `deepseek-ai/DeepSeek-V3 `| L |
 | `LlavaLlamaModel` | VILA | `Efficient-Large-Model/NVILA-8B` | L + V |
 | `LlavaNextForConditionalGeneration` | LLaVA-NeXT | `llava-hf/llava-v1.6-mistral-7b-hf` | L + V |
@@ -52,7 +54,9 @@ python3 quickstart_multimodal.py --model_dir Efficient-Large-Model/NVILA-8B --mo
 | `MixtralForCausalLM` | Mixtral | `mistralai/Mixtral-8x7B-v0.1` | L |
 | `MllamaForConditionalGeneration` | Llama 3.2 | `meta-llama/Llama-3.2-11B-Vision` | L |
 | `NemotronForCausalLM` | Nemotron-3, Nemotron-4, Minitron | `nvidia/Minitron-8B-Base` | L |
-| `NemotronNASForCausalLM` | NemotronNAS | `nvidia/Llama-3_3-Nemotron-Super-49B-v1` | L |
+| `NemotronNASForCausalLM` | LLamaNemotron  | `nvidia/Llama-3_1-Nemotron-51B-Instruct` | L |
+| `NemotronNASForCausalLM` | LlamaNemotron Super | `nvidia/Llama-3_3-Nemotron-Super-49B-v1` | L |
+| `NemotronNASForCausalLM` | LlamaNemotron Ultra | `nvidia/Llama-3_1-Nemotron-Ultra-253B-v1` | L |
 | `Qwen2ForCausalLM` | QwQ, Qwen2 | `Qwen/Qwen2-7B-Instruct` | L |
 | `Qwen2ForProcessRewardModel` | Qwen2-based | `Qwen/Qwen2.5-Math-PRM-7B` | L |
 | `Qwen2ForRewardModel` | Qwen2-based | `Qwen/Qwen2.5-Math-RM-72B` | L |