NVIDIA-NeMo
diff --git a/‎docs/guides/llm/functiongemma-peft-loss.png‎
39.1 KB b/‎docs/guides/llm/functiongemma-peft-loss.png‎
39.1 KB
diff --git a/‎docs/guides/llm/functiongemma-sft-loss.png‎
37.5 KB b/‎docs/guides/llm/functiongemma-sft-loss.png‎
37.5 KB
diff --git a/‎docs/guides/llm/toolcalling.md‎
Lines changed: 124 additions & 0 deletions b/‎docs/guides/llm/toolcalling.md‎
Lines changed: 124 additions & 0 deletions
diff --git a/‎docs/index.md‎
Lines changed: 1 addition & 0 deletions b/‎docs/index.md‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎examples/llm_finetune/gemma/functiongemma_xlam.yaml‎
Lines changed: 117 additions & 0 deletions b/‎examples/llm_finetune/gemma/functiongemma_xlam.yaml‎
Lines changed: 117 additions & 0 deletions
diff --git a/‎nemo_automodel/_transformers/auto_model.py‎
Lines changed: 7 additions & 1 deletion b/‎nemo_automodel/_transformers/auto_model.py‎
Lines changed: 7 additions & 1 deletion
diff --git a/‎tests/unit_tests/_transformers/test_auto_model.py‎
Lines changed: 31 additions & 0 deletions b/‎tests/unit_tests/_transformers/test_auto_model.py‎
Lines changed: 31 additions & 0 deletions
@@ -0,0 +1,124 @@
+# Function Calling with NeMo Automodel using FunctionGemma
+
+This tutorial walks through fine-tuning [FunctionGemma](https://huggingface.co/google/functiongemma-270m-it), Google's 270m function-calling model with NeMo Automodel on the xLAM function-calling dataset.
+
+
+## FunctionGemma introduction
+FunctionGemma is a lightweight, 270M-parameter variant built on the Gemma 3 architecture with a function-calling chat format. It is intended to be fine-tuned for task-specific function calling, and its compact size makes it practical for edge or resource-constrained deployments.
+- Gemma 3 architecture, updated tokenizer and function-calling chat format.
+- Trained specifically for function calling: multiple tool definitions, parallel calls, tool responses, and natural-language summaries.
+- Small/edge friendly: ~270M params for fast, dense inference on-device.
+- Text-only, function-oriented model (not a general dialogue model), best used after task-specific finetuning.
+
+## Prerequisites
+- Install NeMo Automodel and its extras: `pip install nemo-automodel`.
+- A FunctionGemma checkpoint available locally or via https://huggingface.co/google/functiongemma-270m-it.
+- Small model footprint: can be fine-tuned on a single GPU; scale batch/sequence as needed.
+
+## The xLAM dataset
+xLAM is a function-calling dataset containing user queries, tool schemas, and tool call traces. It covers diverse tools and arguments so models learn to emit structured tool calls.
+- Dataset URL: https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k
+- Each sample provides:
+  - `query`: the user request.
+  - `tools`: tool definitions (lightweight schema).
+  - `answers`: tool calls with serialized arguments.
+
+Example entry:
+```json
+{
+  "id": 123,
+  "query": "Book me a table for two at 7pm in Seattle.",
+  "tools": [
+    {
+      "name": "book_table",
+      "description": "Book a restaurant table",
+      "parameters": {
+        "party_size": {"type": "int"},
+        "time": {"type": "string"},
+        "city": {"type": "string"}
+      }
+    }
+  ],
+  "answers": [
+    {
+      "name": "book_table",
+      "arguments": "{\"party_size\":2,\"time\":\"19:00\",\"city\":\"Seattle\"}"
+    }
+  ]
+}
+```
+
+
+The helper `make_xlam_dataset` converts each xLAM row into OpenAI-style tool schemas and tool calls, then renders them through the chat template so loss is applied only on the tool-call arguments:
+
+```python
+def _format_example(
+    example,
+    tokenizer,
+    eos_token_id,
+    pad_token_id,
+    seq_length=None,
+    padding=None,
+    truncation=None,
+):
+    tools = _convert_tools(_json_load_if_str(example["tools"]))
+    tool_calls = _convert_tool_calls(_json_load_if_str(example["answers"]), example_id=example.get("id"))
+
+    formatted_text = [
+        {"role": "user", "content": example["query"]},
+        {"role": "assistant", "content": "", "tool_calls": tool_calls},
+    ]
+
+    return format_chat_template(
+        tokenizer=tokenizer,
+        formatted_text=formatted_text,
+        tools=tools,
+        eos_token_id=eos_token_id,
+        pad_token_id=pad_token_id,
+        seq_length=seq_length,
+        padding=padding,
+        truncation=truncation,
+        answer_only_loss_mask=True,
+    )
+```
+
+
+
+## Run full-parameter SFT
+Use the ready-made config at [`examples/llm_finetune/gemma/functiongemma_xlam.yaml`](https://github.com/NVIDIA-NeMo/Automodel/blob/main/examples/llm_finetune/gemma/functiongemma_xlam.yaml) to start finetune:
+
+
+
+With the config in place, launch training (8 GPUs shown; adjust `--nproc-per-node` as needed):
+
+```bash
+torchrun --nproc-per-node=8 examples/llm_finetune/finetune.py \
+  --config examples/llm_finetune/gemma/functiongemma_xlam.yaml
+```
+
+You should be able to see training loss curve similar to the below:
+
+<p align="center">
+  <img src="https://github.com/NVIDIA-NeMo/Automodel/blob/main/docs/guides/llm/functiongemma-sft-loss.png" alt="FunctionGemma SFT loss" width="400">
+</p>
+
+## Run PEFT (LoRA)
+To apply LoRA (PEFT), uncomment the `peft` block in the recipe and tune rank/alpha/targets per the [SFT/PEFT guide](https://github.com/NVIDIA-NeMo/Automodel/blob/main/docs/guides/llm/toolcalling.md). Example override:
+
+```yaml
+peft:
+  _target_: nemo_automodel.components._peft.lora.PeftConfig
+  match_all_linear: true
+  dim: 16
+  alpha: 16
+  use_triton: true
+```
+Then fine-tune with the same recipe. Adjust the number of GPUs as needed.
+```bash
+torchrun --nproc-per-node=1 examples/llm_finetune/finetune.py \
+  --config examples/llm_finetune/gemma/functiongemma_xlam.yaml
+```
+
+<p align="center">
+  <img src="https://github.com/NVIDIA-NeMo/Automodel/blob/main/docs/guides/llm/functiongemma-peft-loss.png" alt="FunctionGemma PEFT loss" width="400">
+</p>
@@ -30,6 +30,7 @@ Fine-tune Hugging Face Models Instantly with Day-0 Support with NVIDIA NeMo Auto
 :hidden:
 guides/overview.md
 guides/llm/finetune.md
+guides/llm/toolcalling.md
 guides/llm/mcore-pretraining.md
 guides/llm/pretraining.md
 guides/llm/sequence-classification.md
 
@@ -0,0 +1,117 @@
+# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+
+# To run this recipe, please use the following command:
+# torchrun --nproc-per-node=8 recipes/llm_finetune/finetune.py --config examples/llm_finetune/gemma/functiongemma_xlam.yaml
+# Adjust --nproc-per-node to the number of GPUs available on your host machine.
+
+
+step_scheduler:
+  global_batch_size: 32
+  local_batch_size: 4
+  ckpt_every_steps: 500
+  val_every_steps: 500  # will run every x number of gradient steps
+  max_steps: 500
+
+dist_env:
+  backend: nccl
+  timeout_minutes: 1
+
+rng:
+  _target_: nemo_automodel.components.training.rng.StatefulRNG
+  seed: 1111
+  ranked: true
+
+model:
+  _target_: nemo_automodel.NeMoAutoModelForCausalLM.from_pretrained
+  pretrained_model_name_or_path: google/functiongemma-270m-it
+  attn_implementation: eager
+
+# uncomment for peft
+# peft:
+#   _target_: nemo_automodel.components._peft.lora.PeftConfig
+#   match_all_linear: True
+#   dim: 16
+#   alpha: 16
+#   use_triton: True
+
+# torch.compile configuration
+compile:
+  enabled: false
+  mode: "default"  # Options: "default", "reduce-overhead", "max-autotune"
+  fullgraph: false
+  dynamic: true  # Set to false for better performance with fixed shapes
+  backend: null  # Use default backend (inductor)
+
+clip_grad_norm:
+  max_norm: 1.0
+
+distributed:
+  _target_: nemo_automodel.components.distributed.fsdp2.FSDP2Manager
+  dp_size: none
+  dp_replicate_size: 1
+  cp_size: 1
+  sequence_parallel: false
+
+loss_fn:
+  _target_: nemo_automodel.components.loss.masked_ce.MaskedCrossEntropy
+
+dataset:
+  _target_: nemo_automodel.components.datasets.llm.xlam.make_xlam_dataset
+  dataset_name: Salesforce/xlam-function-calling-60k
+  split: train
+  tokenizer:
+    pretrained_model_name_or_path: google/functiongemma-270m-it
+
+packed_sequence:
+  packed_sequence_size: 0
+
+dataloader:
+  _target_: torchdata.stateful_dataloader.StatefulDataLoader
+  collate_fn: nemo_automodel.components.datasets.utils.default_collater
+  shuffle: false
+
+validation_dataset:
+  _target_: nemo_automodel.components.datasets.llm.xlam.make_xlam_dataset
+  dataset_name: Salesforce/xlam-function-calling-60k
+  split: train[:256]
+  limit_dataset_samples: 256
+
+validation_dataloader:
+  _target_: torchdata.stateful_dataloader.StatefulDataLoader
+  collate_fn: nemo_automodel.components.datasets.utils.default_collater
+
+optimizer:
+  _target_: torch.optim.Adam
+  betas: [0.9, 0.999]
+  eps: 1e-8
+  lr: 1.0e-5
+  weight_decay: 0
+  # min_lr: 1.0e-5
+
+lr_scheduler:
+  lr_decay_style: cosine
+  min_lr: 1.0e-6
+
+nvtx: false
+
+
+# Uncomment and configure for W&B logging
+# wandb:
+#   project: <your-wandb-project>
+#   entity: <your-wandb-entity>
+#   name: <your-wandb-exp-name>
+
+
@@ -47,6 +47,7 @@
 from nemo_automodel.shared.utils import dtype_from_str
 
 HAS_LIGER_KERNEL, liger_kernel_trf = safe_import("liger_kernel.transformers")
+
 logger = logging.getLogger(__name__)
 
 
@@ -468,8 +469,10 @@ def from_config(
         patch it with Liger or SDPA-optimized kernels.
 
         Args:
-            config (transformers.PretrainedConfig):
+            config (transformers.PretrainedConfig | str):
                 The configuration object used to build the model.
+                If config is passed as a string (e.g., model-id / local checkpoint),
+                it will be create a config internally using AutoConfig.
             *model_args:
                 Positional arguments forwarded to the underlying
                 ``transformers.AutoModelForCausalLM.from_config`` call.
@@ -532,6 +535,9 @@ def _retry(**override):
                 **kwargs,
             )
 
+        # handle model_id passed as config
+        if isinstance(config, str):
+            config = AutoConfig.from_pretrained(config, trust_remote_code=kwargs.get("trust_remote_code", False))
         # 1. if force_hf is True, we will use the parent class to load and return the model as is
         if force_hf:
             return cls._from_config_parent_class(
 
@@ -223,6 +223,37 @@ def test_from_config_happy_path(self):
         model = NeMoAutoModelForCausalLM.from_config(config, attn_implementation="eager")
         assert model.config.nemo_version == __version__
 
+    def test_from_config_with_string_calls_autoconfig(self):
+        """Test that from_config calls AutoConfig.from_pretrained when config is a string."""
+        mock_model = MagicMock()
+        mock_model.config = {}
+        mock_config = Mock()
+        mock_config.architectures = ["HFArch"]
+        mock_config.name_or_path = "hf-internal-testing/tiny-random-gpt2"
+
+        with (
+            patch("nemo_automodel._transformers.auto_model.AutoConfig.from_pretrained") as mock_autoconfig,
+            patch("nemo_automodel._transformers.auto_model.HAS_LIGER_KERNEL", False),
+            patch("nemo_automodel._transformers.auto_model._patch_attention", lambda obj, sdpa_method=None: obj),
+            patch.object(transformers.AutoModelForCausalLM, "from_config") as mock_from_config,
+        ):
+            mock_autoconfig.return_value = mock_config
+            mock_from_config.return_value = mock_model
+
+            model = NeMoAutoModelForCausalLM.from_config(
+                "hf-internal-testing/tiny-random-gpt2",
+                trust_remote_code=False
+            )
+
+            # Verify AutoConfig.from_pretrained was called with the string
+            mock_autoconfig.assert_called_once_with(
+                "hf-internal-testing/tiny-random-gpt2",
+                trust_remote_code=False
+            )
+            # Verify the model was returned
+            assert model is mock_model
+            assert model.config["nemo_version"] == __version__
+
     def test_from_pretrained_runtimeerror_triggers_reload(self):
         """When _patch_liger_kernel raises, the loader should retry with
         use_liger_kernel=False and return the second model instance."""