v3.0.0
Image: quay.io/modh/fms-hf-tuning:v3.0.0
Summary of Changes
Activated LoRA Support
- Support for Activated LoRA model tuning
- Usage is very similar to standard LoRA, with the key difference that an
invocation_stringmust be specified - Available by setting
--peft_methodtoalora - Inference with aLoRA models requires insuring that the invocation string is present in the input
Data Preprocessor Changes
- Breaking Changes to the data preprocessor interface, now utilizing conventional handler and parameter names from HF datasets in data configs
renameandretainare now their own data handlers, not data config parameters- Add flexible train/test dataset splitting by using the
splitparameter in data configs - Merge offline data preprocessor script into main library, can now only preprocess data using
--do_dataprocessing_only
Dependency Updates
peftfrom <0.14 to <0.15.2flash-attnfrom <3.0 to <2.8acceleratefrom <1.1 to <1.7transformersfrom <4.51 to <=4.54.4torchfrom <2.5 to <2.7
Additional Changes
- Updates to tracker framework, additon of ClearML tracker
What's Changed
- docs: add instructions on how to correctly specify the chat template by @HarikrishnanBalagopal in #549
- feat: Data Handling v3 (Breaking change for data config interface) by @dushyantbehl in #494
- docs: Update model architecture in README by @aluu317 in #550
- fix: issues related to providing 2 datasets with diff types by @HarikrishnanBalagopal in #554
- docs: Added gradient checkpointing to docs by @Luka-D in #552
- feat: Add ALoRA support by @kgreenewald in #513
- feat: make activated LoRA an optional flag in the Dockerfile by @HarikrishnanBalagopal in #555
- fix: saving logic for alora by @kmehant in #559
- fix: decouple and update concatenate_dataset functionality from load_dataset by @YashasviChaurasia in #557
- chore: Upgrade transformers, torch, and accelerate version by @Akash-Nayak in #561
- build(deps): Update peft requirement from <=0.14,>=0.8.0 to >=0.8.0,<=0.15.2 by @dependabot[bot] in #556
- feat: add train_test_split functionality via dataconfig by @YashasviChaurasia in #560
- fix: docs and minor code by @dushyantbehl in #570
- fix: Update flash-attn version constraint to <2.8 for compatibility by @Akash-Nayak in #571
- feat: update tracking framework to make it more flexible. Add clearml tracker by @dushyantbehl in #568
- fix: Remove the additional closing curly bracket by @Akash-Nayak in #572
- fix: Add ENABLE_MLFLOW build argument to Dockerfile to control MLflow integration by @Akash-Nayak in #573
- fix: typo and enhanced warning message for jinja and chat template rendering by @dushyantbehl in #574
- fix: trackers should be used only on main process by @dushyantbehl in #578
- fix: Decouple offline data processing from collators by @dushyantbehl in #579
- feat: merge offline processing into the main library by @dushyantbehl in #580
- fix: change logging level to info and print flat arguments by @dushyantbehl in #582
- feat: add error handling for split dataset feat by @YashasviChaurasia in #581
- feat: TC Event to handle final checkpoint by @seshapad in #558
- fix: Restructure and rewite sampling logic to be compatible with split. by @dushyantbehl in #587
- chore(release): merge set of changes for v3.0.0 by @willmj in #588
New Contributors
- @kgreenewald made their first contribution in #513
- @Akash-Nayak made their first contribution in #561
Full Changelog: v2.8.2...v3.0.0