-
Notifications
You must be signed in to change notification settings - Fork 8.2k
Feature: experimental fine-tuning comparison #10172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Feature: experimental fine-tuning comparison #10172
Conversation
Summary of ChangesHello @caterina0718, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces an experimental feature designed to streamline the comparison of different fine-tuning strategies for large language models. By leveraging existing YAML configurations, users can now evaluate two distinct fine-tuning approaches based on key performance indicators such as evaluation loss, perplexity, inference latency, and peak VRAM usage. The feature automates the training, merging (with a robust fallback for memory limitations), and evaluation processes, presenting the results in easily digestible CSV files and comparative plots. This enhancement aims to provide developers with a powerful tool to quickly assess and optimize their fine-tuning experiments. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a useful experimental feature for comparing fine-tuning strategies. The implementation is well-structured with a demo, a CLI, and core logic separated. The use of fallback mechanisms with dummy data for robustness is a good design choice for an experimental feature.
My review includes several suggestions to improve documentation accuracy, script behavior for automation, and general code quality. Key points include correcting command examples in the READMEs, replacing plt.show() with plt.savefig() to allow non-interactive execution, and centralizing imports for better code structure.
| result = subprocess.run(["llamafactory-cli", "train", yaml_path], | ||
| capture_output=True, # set to False to see real-time logs | ||
| text=True # return stdout/stderr as strings | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The run_yaml_training function uses subprocess.run to execute llamafactory-cli. While this works, it's generally more robust and efficient to call the underlying Python functions from the llamafactory library directly. This avoids reliance on the shell environment and PATH, provides better error handling, and simplifies passing data.
| from peft import PeftModel | ||
| from transformers import AutoModelForCausalLM, AutoTokenizer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Imports like peft and transformers are currently inside the merge_lora_checkpoint function. It's a best practice to place all imports at the top of the file. This improves code readability, makes dependencies clear, and avoids repeated import costs if the function is called multiple times. This also applies to other local imports in this file, such as uuid, llamafactory.eval.evaluator, and json.
Export plot instead of showing it Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Corrected path to demo Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Corrected bash command to run the fine-tuning demo Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Corrected path to the fine-tuning demo Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Extra " removed Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Adds an experimental feature to compare two fine-tuning strategies using pre-defined .yaml configs.
Generates evaluation metrics (loss, perplexity, latency, peak VRAM) and outputs CSV and plots.
Default settings are minimal (
max_samples=2,batch_size=1) for quick tests; larger settings may increase memory usage.Includes:
/examples/ft_comparison_demo.py/scripts/finetuning_comparison/cli_yaml_compare.py/scripts/finetuning_comparisontests/eval