Skip to content

Commit 3e31ddb

Browse files
committed
Merge branch 'main' into huiyingl/update_readme
2 parents 9dcbefb + 4e9d32b commit 3e31ddb

File tree

8 files changed

+420
-18
lines changed

8 files changed

+420
-18
lines changed
39.1 KB
Loading
37.5 KB
Loading

docs/guides/llm/toolcalling.md

Lines changed: 124 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,124 @@
1+
# Function Calling with NeMo Automodel using FunctionGemma
2+
3+
This tutorial walks through fine-tuning [FunctionGemma](https://huggingface.co/google/functiongemma-270m-it), Google's 270m function-calling model with NeMo Automodel on the xLAM function-calling dataset.
4+
5+
6+
## FunctionGemma introduction
7+
FunctionGemma is a lightweight, 270M-parameter variant built on the Gemma 3 architecture with a function-calling chat format. It is intended to be fine-tuned for task-specific function calling, and its compact size makes it practical for edge or resource-constrained deployments.
8+
- Gemma 3 architecture, updated tokenizer and function-calling chat format.
9+
- Trained specifically for function calling: multiple tool definitions, parallel calls, tool responses, and natural-language summaries.
10+
- Small/edge friendly: ~270M params for fast, dense inference on-device.
11+
- Text-only, function-oriented model (not a general dialogue model), best used after task-specific finetuning.
12+
13+
## Prerequisites
14+
- Install NeMo Automodel and its extras: `pip install nemo-automodel`.
15+
- A FunctionGemma checkpoint available locally or via https://huggingface.co/google/functiongemma-270m-it.
16+
- Small model footprint: can be fine-tuned on a single GPU; scale batch/sequence as needed.
17+
18+
## The xLAM dataset
19+
xLAM is a function-calling dataset containing user queries, tool schemas, and tool call traces. It covers diverse tools and arguments so models learn to emit structured tool calls.
20+
- Dataset URL: https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k
21+
- Each sample provides:
22+
- `query`: the user request.
23+
- `tools`: tool definitions (lightweight schema).
24+
- `answers`: tool calls with serialized arguments.
25+
26+
Example entry:
27+
```json
28+
{
29+
"id": 123,
30+
"query": "Book me a table for two at 7pm in Seattle.",
31+
"tools": [
32+
{
33+
"name": "book_table",
34+
"description": "Book a restaurant table",
35+
"parameters": {
36+
"party_size": {"type": "int"},
37+
"time": {"type": "string"},
38+
"city": {"type": "string"}
39+
}
40+
}
41+
],
42+
"answers": [
43+
{
44+
"name": "book_table",
45+
"arguments": "{\"party_size\":2,\"time\":\"19:00\",\"city\":\"Seattle\"}"
46+
}
47+
]
48+
}
49+
```
50+
51+
52+
The helper `make_xlam_dataset` converts each xLAM row into OpenAI-style tool schemas and tool calls, then renders them through the chat template so loss is applied only on the tool-call arguments:
53+
54+
```python
55+
def _format_example(
56+
example,
57+
tokenizer,
58+
eos_token_id,
59+
pad_token_id,
60+
seq_length=None,
61+
padding=None,
62+
truncation=None,
63+
):
64+
tools = _convert_tools(_json_load_if_str(example["tools"]))
65+
tool_calls = _convert_tool_calls(_json_load_if_str(example["answers"]), example_id=example.get("id"))
66+
67+
formatted_text = [
68+
{"role": "user", "content": example["query"]},
69+
{"role": "assistant", "content": "", "tool_calls": tool_calls},
70+
]
71+
72+
return format_chat_template(
73+
tokenizer=tokenizer,
74+
formatted_text=formatted_text,
75+
tools=tools,
76+
eos_token_id=eos_token_id,
77+
pad_token_id=pad_token_id,
78+
seq_length=seq_length,
79+
padding=padding,
80+
truncation=truncation,
81+
answer_only_loss_mask=True,
82+
)
83+
```
84+
85+
86+
87+
## Run full-parameter SFT
88+
Use the ready-made config at [`examples/llm_finetune/gemma/functiongemma_xlam.yaml`](https://github.com/NVIDIA-NeMo/Automodel/blob/main/examples/llm_finetune/gemma/functiongemma_xlam.yaml) to start finetune:
89+
90+
91+
92+
With the config in place, launch training (8 GPUs shown; adjust `--nproc-per-node` as needed):
93+
94+
```bash
95+
torchrun --nproc-per-node=8 examples/llm_finetune/finetune.py \
96+
--config examples/llm_finetune/gemma/functiongemma_xlam.yaml
97+
```
98+
99+
You should be able to see training loss curve similar to the below:
100+
101+
<p align="center">
102+
<img src="https://github.com/NVIDIA-NeMo/Automodel/blob/main/docs/guides/llm/functiongemma-sft-loss.png" alt="FunctionGemma SFT loss" width="400">
103+
</p>
104+
105+
## Run PEFT (LoRA)
106+
To apply LoRA (PEFT), uncomment the `peft` block in the recipe and tune rank/alpha/targets per the [SFT/PEFT guide](https://github.com/NVIDIA-NeMo/Automodel/blob/main/docs/guides/llm/toolcalling.md). Example override:
107+
108+
```yaml
109+
peft:
110+
_target_: nemo_automodel.components._peft.lora.PeftConfig
111+
match_all_linear: true
112+
dim: 16
113+
alpha: 16
114+
use_triton: true
115+
```
116+
Then fine-tune with the same recipe. Adjust the number of GPUs as needed.
117+
```bash
118+
torchrun --nproc-per-node=1 examples/llm_finetune/finetune.py \
119+
--config examples/llm_finetune/gemma/functiongemma_xlam.yaml
120+
```
121+
122+
<p align="center">
123+
<img src="https://github.com/NVIDIA-NeMo/Automodel/blob/main/docs/guides/llm/functiongemma-peft-loss.png" alt="FunctionGemma PEFT loss" width="400">
124+
</p>

docs/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ Fine-tune Hugging Face Models Instantly with Day-0 Support with NVIDIA NeMo Auto
3030
:hidden:
3131
guides/overview.md
3232
guides/llm/finetune.md
33+
guides/llm/toolcalling.md
3334
guides/llm/mcore-pretraining.md
3435
guides/llm/pretraining.md
3536
guides/llm/sequence-classification.md
Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,117 @@
1+
# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
16+
# To run this recipe, please use the following command:
17+
# torchrun --nproc-per-node=8 recipes/llm_finetune/finetune.py --config examples/llm_finetune/gemma/functiongemma_xlam.yaml
18+
# Adjust --nproc-per-node to the number of GPUs available on your host machine.
19+
20+
21+
step_scheduler:
22+
global_batch_size: 32
23+
local_batch_size: 4
24+
ckpt_every_steps: 500
25+
val_every_steps: 500 # will run every x number of gradient steps
26+
max_steps: 500
27+
28+
dist_env:
29+
backend: nccl
30+
timeout_minutes: 1
31+
32+
rng:
33+
_target_: nemo_automodel.components.training.rng.StatefulRNG
34+
seed: 1111
35+
ranked: true
36+
37+
model:
38+
_target_: nemo_automodel.NeMoAutoModelForCausalLM.from_pretrained
39+
pretrained_model_name_or_path: google/functiongemma-270m-it
40+
attn_implementation: eager
41+
42+
# uncomment for peft
43+
# peft:
44+
# _target_: nemo_automodel.components._peft.lora.PeftConfig
45+
# match_all_linear: True
46+
# dim: 16
47+
# alpha: 16
48+
# use_triton: True
49+
50+
# torch.compile configuration
51+
compile:
52+
enabled: false
53+
mode: "default" # Options: "default", "reduce-overhead", "max-autotune"
54+
fullgraph: false
55+
dynamic: true # Set to false for better performance with fixed shapes
56+
backend: null # Use default backend (inductor)
57+
58+
clip_grad_norm:
59+
max_norm: 1.0
60+
61+
distributed:
62+
_target_: nemo_automodel.components.distributed.fsdp2.FSDP2Manager
63+
dp_size: none
64+
dp_replicate_size: 1
65+
cp_size: 1
66+
sequence_parallel: false
67+
68+
loss_fn:
69+
_target_: nemo_automodel.components.loss.masked_ce.MaskedCrossEntropy
70+
71+
dataset:
72+
_target_: nemo_automodel.components.datasets.llm.xlam.make_xlam_dataset
73+
dataset_name: Salesforce/xlam-function-calling-60k
74+
split: train
75+
tokenizer:
76+
pretrained_model_name_or_path: google/functiongemma-270m-it
77+
78+
packed_sequence:
79+
packed_sequence_size: 0
80+
81+
dataloader:
82+
_target_: torchdata.stateful_dataloader.StatefulDataLoader
83+
collate_fn: nemo_automodel.components.datasets.utils.default_collater
84+
shuffle: false
85+
86+
validation_dataset:
87+
_target_: nemo_automodel.components.datasets.llm.xlam.make_xlam_dataset
88+
dataset_name: Salesforce/xlam-function-calling-60k
89+
split: train[:256]
90+
limit_dataset_samples: 256
91+
92+
validation_dataloader:
93+
_target_: torchdata.stateful_dataloader.StatefulDataLoader
94+
collate_fn: nemo_automodel.components.datasets.utils.default_collater
95+
96+
optimizer:
97+
_target_: torch.optim.Adam
98+
betas: [0.9, 0.999]
99+
eps: 1e-8
100+
lr: 1.0e-5
101+
weight_decay: 0
102+
# min_lr: 1.0e-5
103+
104+
lr_scheduler:
105+
lr_decay_style: cosine
106+
min_lr: 1.0e-6
107+
108+
nvtx: false
109+
110+
111+
# Uncomment and configure for W&B logging
112+
# wandb:
113+
# project: <your-wandb-project>
114+
# entity: <your-wandb-entity>
115+
# name: <your-wandb-exp-name>
116+
117+

nemo_automodel/_transformers/auto_model.py

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,7 @@
4747
from nemo_automodel.shared.utils import dtype_from_str
4848

4949
HAS_LIGER_KERNEL, liger_kernel_trf = safe_import("liger_kernel.transformers")
50+
5051
logger = logging.getLogger(__name__)
5152

5253

@@ -468,8 +469,10 @@ def from_config(
468469
patch it with Liger or SDPA-optimized kernels.
469470
470471
Args:
471-
config (transformers.PretrainedConfig):
472+
config (transformers.PretrainedConfig | str):
472473
The configuration object used to build the model.
474+
If config is passed as a string (e.g., model-id / local checkpoint),
475+
it will be create a config internally using AutoConfig.
473476
*model_args:
474477
Positional arguments forwarded to the underlying
475478
``transformers.AutoModelForCausalLM.from_config`` call.
@@ -532,6 +535,9 @@ def _retry(**override):
532535
**kwargs,
533536
)
534537

538+
# handle model_id passed as config
539+
if isinstance(config, str):
540+
config = AutoConfig.from_pretrained(config, trust_remote_code=kwargs.get("trust_remote_code", False))
535541
# 1. if force_hf is True, we will use the parent class to load and return the model as is
536542
if force_hf:
537543
return cls._from_config_parent_class(

tests/unit_tests/_transformers/test_auto_model.py

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -223,6 +223,37 @@ def test_from_config_happy_path(self):
223223
model = NeMoAutoModelForCausalLM.from_config(config, attn_implementation="eager")
224224
assert model.config.nemo_version == __version__
225225

226+
def test_from_config_with_string_calls_autoconfig(self):
227+
"""Test that from_config calls AutoConfig.from_pretrained when config is a string."""
228+
mock_model = MagicMock()
229+
mock_model.config = {}
230+
mock_config = Mock()
231+
mock_config.architectures = ["HFArch"]
232+
mock_config.name_or_path = "hf-internal-testing/tiny-random-gpt2"
233+
234+
with (
235+
patch("nemo_automodel._transformers.auto_model.AutoConfig.from_pretrained") as mock_autoconfig,
236+
patch("nemo_automodel._transformers.auto_model.HAS_LIGER_KERNEL", False),
237+
patch("nemo_automodel._transformers.auto_model._patch_attention", lambda obj, sdpa_method=None: obj),
238+
patch.object(transformers.AutoModelForCausalLM, "from_config") as mock_from_config,
239+
):
240+
mock_autoconfig.return_value = mock_config
241+
mock_from_config.return_value = mock_model
242+
243+
model = NeMoAutoModelForCausalLM.from_config(
244+
"hf-internal-testing/tiny-random-gpt2",
245+
trust_remote_code=False
246+
)
247+
248+
# Verify AutoConfig.from_pretrained was called with the string
249+
mock_autoconfig.assert_called_once_with(
250+
"hf-internal-testing/tiny-random-gpt2",
251+
trust_remote_code=False
252+
)
253+
# Verify the model was returned
254+
assert model is mock_model
255+
assert model.config["nemo_version"] == __version__
256+
226257
def test_from_pretrained_runtimeerror_triggers_reload(self):
227258
"""When _patch_liger_kernel raises, the loader should retry with
228259
use_liger_kernel=False and return the second model instance."""

0 commit comments

Comments
 (0)