|
| 1 | +# Function Calling with NeMo Automodel using FunctionGemma |
| 2 | + |
| 3 | +This tutorial walks through fine-tuning [FunctionGemma](https://huggingface.co/google/functiongemma-270m-it), Google's 270m function-calling model with NeMo Automodel on the xLAM function-calling dataset. |
| 4 | + |
| 5 | + |
| 6 | +## FunctionGemma introduction |
| 7 | +FunctionGemma is a lightweight, 270M-parameter variant built on the Gemma 3 architecture with a function-calling chat format. It is intended to be fine-tuned for task-specific function calling, and its compact size makes it practical for edge or resource-constrained deployments. |
| 8 | +- Gemma 3 architecture, updated tokenizer and function-calling chat format. |
| 9 | +- Trained specifically for function calling: multiple tool definitions, parallel calls, tool responses, and natural-language summaries. |
| 10 | +- Small/edge friendly: ~270M params for fast, dense inference on-device. |
| 11 | +- Text-only, function-oriented model (not a general dialogue model), best used after task-specific finetuning. |
| 12 | + |
| 13 | +## Prerequisites |
| 14 | +- Install NeMo Automodel and its extras: `pip install nemo-automodel`. |
| 15 | +- A FunctionGemma checkpoint available locally or via https://huggingface.co/google/functiongemma-270m-it. |
| 16 | +- Small model footprint: can be fine-tuned on a single GPU; scale batch/sequence as needed. |
| 17 | + |
| 18 | +## The xLAM dataset |
| 19 | +xLAM is a function-calling dataset containing user queries, tool schemas, and tool call traces. It covers diverse tools and arguments so models learn to emit structured tool calls. |
| 20 | +- Dataset URL: https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k |
| 21 | +- Each sample provides: |
| 22 | + - `query`: the user request. |
| 23 | + - `tools`: tool definitions (lightweight schema). |
| 24 | + - `answers`: tool calls with serialized arguments. |
| 25 | + |
| 26 | +Example entry: |
| 27 | +```json |
| 28 | +{ |
| 29 | + "id": 123, |
| 30 | + "query": "Book me a table for two at 7pm in Seattle.", |
| 31 | + "tools": [ |
| 32 | + { |
| 33 | + "name": "book_table", |
| 34 | + "description": "Book a restaurant table", |
| 35 | + "parameters": { |
| 36 | + "party_size": {"type": "int"}, |
| 37 | + "time": {"type": "string"}, |
| 38 | + "city": {"type": "string"} |
| 39 | + } |
| 40 | + } |
| 41 | + ], |
| 42 | + "answers": [ |
| 43 | + { |
| 44 | + "name": "book_table", |
| 45 | + "arguments": "{\"party_size\":2,\"time\":\"19:00\",\"city\":\"Seattle\"}" |
| 46 | + } |
| 47 | + ] |
| 48 | +} |
| 49 | +``` |
| 50 | + |
| 51 | + |
| 52 | +The helper `make_xlam_dataset` converts each xLAM row into OpenAI-style tool schemas and tool calls, then renders them through the chat template so loss is applied only on the tool-call arguments: |
| 53 | + |
| 54 | +```python |
| 55 | +def _format_example( |
| 56 | + example, |
| 57 | + tokenizer, |
| 58 | + eos_token_id, |
| 59 | + pad_token_id, |
| 60 | + seq_length=None, |
| 61 | + padding=None, |
| 62 | + truncation=None, |
| 63 | +): |
| 64 | + tools = _convert_tools(_json_load_if_str(example["tools"])) |
| 65 | + tool_calls = _convert_tool_calls(_json_load_if_str(example["answers"]), example_id=example.get("id")) |
| 66 | + |
| 67 | + formatted_text = [ |
| 68 | + {"role": "user", "content": example["query"]}, |
| 69 | + {"role": "assistant", "content": "", "tool_calls": tool_calls}, |
| 70 | + ] |
| 71 | + |
| 72 | + return format_chat_template( |
| 73 | + tokenizer=tokenizer, |
| 74 | + formatted_text=formatted_text, |
| 75 | + tools=tools, |
| 76 | + eos_token_id=eos_token_id, |
| 77 | + pad_token_id=pad_token_id, |
| 78 | + seq_length=seq_length, |
| 79 | + padding=padding, |
| 80 | + truncation=truncation, |
| 81 | + answer_only_loss_mask=True, |
| 82 | + ) |
| 83 | +``` |
| 84 | + |
| 85 | + |
| 86 | + |
| 87 | +## Run full-parameter SFT |
| 88 | +Use the ready-made config at [`examples/llm_finetune/gemma/functiongemma_xlam.yaml`](https://github.com/NVIDIA-NeMo/Automodel/blob/main/examples/llm_finetune/gemma/functiongemma_xlam.yaml) to start finetune: |
| 89 | + |
| 90 | + |
| 91 | + |
| 92 | +With the config in place, launch training (8 GPUs shown; adjust `--nproc-per-node` as needed): |
| 93 | + |
| 94 | +```bash |
| 95 | +torchrun --nproc-per-node=8 examples/llm_finetune/finetune.py \ |
| 96 | + --config examples/llm_finetune/gemma/functiongemma_xlam.yaml |
| 97 | +``` |
| 98 | + |
| 99 | +You should be able to see training loss curve similar to the below: |
| 100 | + |
| 101 | +<p align="center"> |
| 102 | + <img src="https://github.com/NVIDIA-NeMo/Automodel/blob/main/docs/guides/llm/functiongemma-sft-loss.png" alt="FunctionGemma SFT loss" width="400"> |
| 103 | +</p> |
| 104 | + |
| 105 | +## Run PEFT (LoRA) |
| 106 | +To apply LoRA (PEFT), uncomment the `peft` block in the recipe and tune rank/alpha/targets per the [SFT/PEFT guide](https://github.com/NVIDIA-NeMo/Automodel/blob/main/docs/guides/llm/toolcalling.md). Example override: |
| 107 | + |
| 108 | +```yaml |
| 109 | +peft: |
| 110 | + _target_: nemo_automodel.components._peft.lora.PeftConfig |
| 111 | + match_all_linear: true |
| 112 | + dim: 16 |
| 113 | + alpha: 16 |
| 114 | + use_triton: true |
| 115 | +``` |
| 116 | +Then fine-tune with the same recipe. Adjust the number of GPUs as needed. |
| 117 | +```bash |
| 118 | +torchrun --nproc-per-node=1 examples/llm_finetune/finetune.py \ |
| 119 | + --config examples/llm_finetune/gemma/functiongemma_xlam.yaml |
| 120 | +``` |
| 121 | + |
| 122 | +<p align="center"> |
| 123 | + <img src="https://github.com/NVIDIA-NeMo/Automodel/blob/main/docs/guides/llm/functiongemma-peft-loss.png" alt="FunctionGemma PEFT loss" width="400"> |
| 124 | +</p> |
0 commit comments