Skip to content

Commit f7c2eb4

Browse files
authored
Update Nemotron Super and Ultra in Supported Models and add an example (NVIDIA#3632)
* Update Nemotron Super and Ultra in Supported Models and add an example Signed-off-by: Nave Assaf <[email protected]> * Update README link to match new examples structure Signed-off-by: Nave Assaf <[email protected]> --------- Signed-off-by: Nave Assaf <[email protected]>
1 parent 17eba98 commit f7c2eb4

File tree

2 files changed

+9
-2
lines changed

2 files changed

+9
-2
lines changed

examples/models/core/nemotron_nas/README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,9 @@ The TensorRT-LLM Nemotron-NAS implementation can be found in [tensorrt_llm/model
1616

1717
* [`convert_checkpoint.py`](./convert_checkpoint.py) to convert the model into tensorrt-llm checkpoint format.
1818

19+
The recommended flow for using Nemotron-NAS models is through TRTLLM's PyTorch-based flow.
20+
An example of how to run `Nemotron-NAS` models through the PyTorch workflow can be found in the [PyTorch quickstart example](../../../pytorch/README.md).
21+
1922
## Support Matrix
2023

2124
* FP16

examples/pytorch/README.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,9 @@ python3 quickstart_advanced.py --model_dir nvidia/Llama-3.1-8B-Instruct-FP8 --tp
2323

2424
# FP8(e4m3) kvcache
2525
python3 quickstart_advanced.py --model_dir nvidia/Llama-3.1-8B-Instruct-FP8 --kv_cache_dtype fp8
26+
27+
# BF16 + TP=8
28+
python3 quickstart_advanced.py --model_dir nvidia/Llama-3_1-Nemotron-Ultra-253B-v1 --tp_size 8
2629
```
2730

2831
Run the multimodal example script:
@@ -42,7 +45,6 @@ python3 quickstart_multimodal.py --model_dir Efficient-Large-Model/NVILA-8B --mo
4245
| Architecture | Model | HuggingFace Example | Modality |
4346
|--------------|-------|---------------------|----------|
4447
| `BertForSequenceClassification` | BERT-based | `textattack/bert-base-uncased-yelp-polarity` | L |
45-
| `DeciLMForCausalLM` | Nemotron | `nvidia/Llama-3_1-Nemotron-51B-Instruct` | L |
4648
| `DeepseekV3ForCausalLM` | DeepSeek-V3 | `deepseek-ai/DeepSeek-V3 `| L |
4749
| `LlavaLlamaModel` | VILA | `Efficient-Large-Model/NVILA-8B` | L + V |
4850
| `LlavaNextForConditionalGeneration` | LLaVA-NeXT | `llava-hf/llava-v1.6-mistral-7b-hf` | L + V |
@@ -52,7 +54,9 @@ python3 quickstart_multimodal.py --model_dir Efficient-Large-Model/NVILA-8B --mo
5254
| `MixtralForCausalLM` | Mixtral | `mistralai/Mixtral-8x7B-v0.1` | L |
5355
| `MllamaForConditionalGeneration` | Llama 3.2 | `meta-llama/Llama-3.2-11B-Vision` | L |
5456
| `NemotronForCausalLM` | Nemotron-3, Nemotron-4, Minitron | `nvidia/Minitron-8B-Base` | L |
55-
| `NemotronNASForCausalLM` | NemotronNAS | `nvidia/Llama-3_3-Nemotron-Super-49B-v1` | L |
57+
| `NemotronNASForCausalLM` | LLamaNemotron | `nvidia/Llama-3_1-Nemotron-51B-Instruct` | L |
58+
| `NemotronNASForCausalLM` | LlamaNemotron Super | `nvidia/Llama-3_3-Nemotron-Super-49B-v1` | L |
59+
| `NemotronNASForCausalLM` | LlamaNemotron Ultra | `nvidia/Llama-3_1-Nemotron-Ultra-253B-v1` | L |
5660
| `Qwen2ForCausalLM` | QwQ, Qwen2 | `Qwen/Qwen2-7B-Instruct` | L |
5761
| `Qwen2ForProcessRewardModel` | Qwen2-based | `Qwen/Qwen2.5-Math-PRM-7B` | L |
5862
| `Qwen2ForRewardModel` | Qwen2-based | `Qwen/Qwen2.5-Math-RM-72B` | L |

0 commit comments

Comments
 (0)