Finetuning NVIDIA Nemotron 3 Nano in NeMo Automodel #976
adil-a
started this conversation in
Show and tell
Replies: 1 comment
-
|
The diff between Full and PEFT is drastic, do you guys have anymore experiments? thoughts? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
NVIDIA Nemotron 3 Nano is a reasoning model that employs a hybrid Mixture-of-Experts (MoE) architecture which has 3.5B active parameters and 30B parameters in total. The model is the best amongst its size class on SWE-Bench, GPQA Diamond, etc.
For a quick start-up, we offer recipes for both full-model SFT and PEFT.
Data
We use the SQuAD Q/A dataset in this walkthrough. Below is an example sample:
{ "context": "In the past, the Malays used to call the Portuguese Serani from the Arabic Nasrani, but the term now refers to the modern Kristang creoles of Malaysia.", "question": "What term did the Malays use for the Portuguese Serani?", "answers": {"text": ["Nasrani"], "answer_start": [75]} }Run the Fine-Tune Script
Apply YAML-Based Configuration
NeMo Automodel uses a flexible configuration system that combines YAML configuration files with command-line overrides. This allows you to maintain base configurations while easily experimenting with different parameters.
PEFT
Configure Model Freezing
In this run, we add LoRA weights to all linear layers, but this can be modified to pass in specific target modules.
Full SFT
Happy finetuning!
Beta Was this translation helpful? Give feedback.
All reactions