Feature: experimental fine-tuning comparison #10172

caterina0718 · 2026-02-06T21:28:22Z

Adds an experimental feature to compare two fine-tuning strategies using pre-defined .yaml configs.
Generates evaluation metrics (loss, perplexity, latency, peak VRAM) and outputs CSV and plots.

Default settings are minimal (max_samples=2, batch_size=1) for quick tests; larger settings may increase memory usage.

Includes:

Demo script: /examples/ft_comparison_demo.py
CLI: /scripts/finetuning_comparison/cli_yaml_compare.py
Core logic: /scripts/finetuning_comparison
Minimal tests: tests/eval

gemini-code-assist · 2026-02-06T21:28:41Z

Summary of Changes

Hello @caterina0718, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces an experimental feature designed to streamline the comparison of different fine-tuning strategies for large language models. By leveraging existing YAML configurations, users can now evaluate two distinct fine-tuning approaches based on key performance indicators such as evaluation loss, perplexity, inference latency, and peak VRAM usage. The feature automates the training, merging (with a robust fallback for memory limitations), and evaluation processes, presenting the results in easily digestible CSV files and comparative plots. This enhancement aims to provide developers with a powerful tool to quickly assess and optimize their fine-tuning experiments.

Highlights

New Experimental Feature: Introduces an experimental feature for comparing two fine-tuning strategies using pre-defined YAML configuration files.
Comprehensive Metric Generation: The feature generates a suite of evaluation metrics including evaluation loss, perplexity, inference latency, and peak VRAM usage.
Output and Visualization: Results are outputted as CSV files and comparative plots, facilitating easy analysis and visualization of performance differences.
User Accessibility: Includes both a demo script (ft_comparison_demo.py) and a command-line interface (cli_yaml_compare.py) for flexible usage.
Robustness and Fallback: The core logic incorporates a fallback mechanism for LoRA weight merging and evaluation, returning dummy metrics if real evaluation or merging is not possible due to resource constraints.

Changelog

examples/README.md
- Added a new section demonstrating how to use the compare_yaml_runs.py script to compare fine-tuning methods.
examples/comparison/README.md
- New file providing a detailed guide for the fine-tuning comparison feature, covering installation, structure, demo execution, features, and potential future expansions.
examples/comparison/ft_comparison_demo.py
- New demo script that showcases the programmatic comparison of two fine-tuning YAML configurations.
scripts/finetuning_comparison/cli_yaml_compare.py
- New CLI entry point for comparing two fine-tuning YAML configurations, parsing command-line arguments and invoking the core comparison logic.
scripts/finetuning_comparison/yaml_compare.py
- New core logic file containing functions to sequentially run YAML trainings, merge LoRA weights (with memory-aware fallback), evaluate checkpoints for various metrics, and generate CSV reports and comparative plots.
tests/eval/test_fine_tuning.py
- New unit test file to verify the compare_two_yamls function, specifically testing its behavior with mocked training and evaluation results to ensure robustness and fallback functionality.

Activity

This is a new feature pull request, introducing the experimental fine-tuning comparison functionality.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a useful experimental feature for comparing fine-tuning strategies. The implementation is well-structured with a demo, a CLI, and core logic separated. The use of fallback mechanisms with dummy data for robustness is a good design choice for an experimental feature.

My review includes several suggestions to improve documentation accuracy, script behavior for automation, and general code quality. Key points include correcting command examples in the READMEs, replacing plt.show() with plt.savefig() to allow non-interactive execution, and centralizing imports for better code structure.

scripts/finetuning_comparison/yaml_compare.py

examples/README.md

examples/comparison/README.md

scripts/finetuning_comparison/cli_yaml_compare.py

gemini-code-assist · 2026-02-06T21:31:13Z

scripts/finetuning_comparison/yaml_compare.py

+    result = subprocess.run(["llamafactory-cli", "train", yaml_path],
+        capture_output=True, # set to False to see real-time logs
+        text=True # return stdout/stderr as strings
+    )


The run_yaml_training function uses subprocess.run to execute llamafactory-cli. While this works, it's generally more robust and efficient to call the underlying Python functions from the llamafactory library directly. This avoids reliance on the shell environment and PATH, provides better error handling, and simplifies passing data.

gemini-code-assist · 2026-02-06T21:31:13Z

scripts/finetuning_comparison/yaml_compare.py

+        from peft import PeftModel
+        from transformers import AutoModelForCausalLM, AutoTokenizer


Imports like peft and transformers are currently inside the merge_lora_checkpoint function. It's a best practice to place all imports at the top of the file. This improves code readability, makes dependencies clear, and avoids repeated import costs if the function is called multiple times. This also applies to other local imports in this file, such as uuid, llamafactory.eval.evaluator, and json.

scripts/finetuning_comparison/yaml_compare.py

tests/eval/test_fine_tuning.py

Export plot instead of showing it Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Corrected path to demo Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Corrected bash command to run the fine-tuning demo Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Corrected path to the fine-tuning demo Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Extra " removed Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

caterina0718 added 3 commits February 6, 2026 21:14

feat: experimental fine-tuning method evaluation added

448d398

Updated README: clarifications regarding the new feature

f75695a

Cleaned style and quality (based on contribution guidelines)

2848ac3

gemini-code-assist bot reviewed Feb 6, 2026

View reviewed changes

caterina0718 and others added 7 commits February 7, 2026 10:40

Update scripts/finetuning_comparison/yaml_compare

731736f

Export plot instead of showing it Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update examples/comparison/README.md

5965153

Corrected path to demo Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update examples/README.md

64c9f03

Corrected bash command to run the fine-tuning demo Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update examples/comparison/README.md

dd541fe

Corrected path to the fine-tuning demo Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Typo correction - README

9ebf96e

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Typo correction - README

06712d0

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Typo correction - README

828fd2c

Extra " removed Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: experimental fine-tuning comparison #10172

Feature: experimental fine-tuning comparison #10172

caterina0718 commented Feb 6, 2026

Uh oh!

gemini-code-assist bot commented Feb 6, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot Feb 6, 2026

Uh oh!

gemini-code-assist bot Feb 6, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		from peft import PeftModel
		from transformers import AutoModelForCausalLM, AutoTokenizer

Feature: experimental fine-tuning comparison #10172

Are you sure you want to change the base?

Feature: experimental fine-tuning comparison #10172

Conversation

caterina0718 commented Feb 6, 2026

Uh oh!

gemini-code-assist bot commented Feb 6, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant