Skip to content

Conversation

@caterina0718
Copy link

Adds an experimental feature to compare two fine-tuning strategies using pre-defined .yaml configs.
Generates evaluation metrics (loss, perplexity, latency, peak VRAM) and outputs CSV and plots.

Default settings are minimal (max_samples=2, batch_size=1) for quick tests; larger settings may increase memory usage.

Includes:

  • Demo script: /examples/ft_comparison_demo.py
  • CLI: /scripts/finetuning_comparison/cli_yaml_compare.py
  • Core logic: /scripts/finetuning_comparison
  • Minimal tests: tests/eval

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @caterina0718, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces an experimental feature designed to streamline the comparison of different fine-tuning strategies for large language models. By leveraging existing YAML configurations, users can now evaluate two distinct fine-tuning approaches based on key performance indicators such as evaluation loss, perplexity, inference latency, and peak VRAM usage. The feature automates the training, merging (with a robust fallback for memory limitations), and evaluation processes, presenting the results in easily digestible CSV files and comparative plots. This enhancement aims to provide developers with a powerful tool to quickly assess and optimize their fine-tuning experiments.

Highlights

  • New Experimental Feature: Introduces an experimental feature for comparing two fine-tuning strategies using pre-defined YAML configuration files.
  • Comprehensive Metric Generation: The feature generates a suite of evaluation metrics including evaluation loss, perplexity, inference latency, and peak VRAM usage.
  • Output and Visualization: Results are outputted as CSV files and comparative plots, facilitating easy analysis and visualization of performance differences.
  • User Accessibility: Includes both a demo script (ft_comparison_demo.py) and a command-line interface (cli_yaml_compare.py) for flexible usage.
  • Robustness and Fallback: The core logic incorporates a fallback mechanism for LoRA weight merging and evaluation, returning dummy metrics if real evaluation or merging is not possible due to resource constraints.
Changelog
  • examples/README.md
    • Added a new section demonstrating how to use the compare_yaml_runs.py script to compare fine-tuning methods.
  • examples/comparison/README.md
    • New file providing a detailed guide for the fine-tuning comparison feature, covering installation, structure, demo execution, features, and potential future expansions.
  • examples/comparison/ft_comparison_demo.py
    • New demo script that showcases the programmatic comparison of two fine-tuning YAML configurations.
  • scripts/finetuning_comparison/cli_yaml_compare.py
    • New CLI entry point for comparing two fine-tuning YAML configurations, parsing command-line arguments and invoking the core comparison logic.
  • scripts/finetuning_comparison/yaml_compare.py
    • New core logic file containing functions to sequentially run YAML trainings, merge LoRA weights (with memory-aware fallback), evaluate checkpoints for various metrics, and generate CSV reports and comparative plots.
  • tests/eval/test_fine_tuning.py
    • New unit test file to verify the compare_two_yamls function, specifically testing its behavior with mocked training and evaluation results to ensure robustness and fallback functionality.
Activity
  • This is a new feature pull request, introducing the experimental fine-tuning comparison functionality.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a useful experimental feature for comparing fine-tuning strategies. The implementation is well-structured with a demo, a CLI, and core logic separated. The use of fallback mechanisms with dummy data for robustness is a good design choice for an experimental feature.

My review includes several suggestions to improve documentation accuracy, script behavior for automation, and general code quality. Key points include correcting command examples in the READMEs, replacing plt.show() with plt.savefig() to allow non-interactive execution, and centralizing imports for better code structure.

Comment on lines +35 to +38
result = subprocess.run(["llamafactory-cli", "train", yaml_path],
capture_output=True, # set to False to see real-time logs
text=True # return stdout/stderr as strings
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The run_yaml_training function uses subprocess.run to execute llamafactory-cli. While this works, it's generally more robust and efficient to call the underlying Python functions from the llamafactory library directly. This avoids reliance on the shell environment and PATH, provides better error handling, and simplifies passing data.

Comment on lines +73 to +74
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Imports like peft and transformers are currently inside the merge_lora_checkpoint function. It's a best practice to place all imports at the top of the file. This improves code readability, makes dependencies clear, and avoids repeated import costs if the function is called multiple times. This also applies to other local imports in this file, such as uuid, llamafactory.eval.evaluator, and json.

caterina0718 and others added 7 commits February 7, 2026 10:40
Export plot instead of showing it

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Corrected path to demo

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Corrected bash command to run the fine-tuning demo

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Corrected path to the fine-tuning demo

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Extra " removed

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant