Releases: foundation-model-stack/fms-hf-tuning
Releases · foundation-model-stack/fms-hf-tuning
v3.1.1-rc1
What's Changed
- fix: add setup scm version config to pyproject for correct version tag while building wheel. by @dushyantbehl in #634
- fix: pypi classifier tag by @YashasviChaurasia in #636
- chore: update peft version to v0.18.0 by @YashasviChaurasia in #638
- fix: typo in log message about validation dataaset by @HarikrishnanBalagopal in #640
- feat: Adds embed_tokens, lm_head as trainable for vocab expansion in peft and enables tying of adapters by @romitjain in #625
- fix: add global step number in hf moe chpt by @YashasviChaurasia in #643
- fix: ODM dispatch by @romitjain in #647
- fix: nvcr dockerfile build faliing due to flash attn. by @dushyantbehl in #646
- feat: ODM without categories integration with fms-accel by @romitjain in #641
- fix: Updated ODM defaults by @romitjain in #648
Full Changelog: v3.1.0...v3.1.1-rc1
v3.1.0
Image: quay.io/modh/fms-hf-tuning:v3.1.0
Summary
- Support GPT OSS class of models.
- Support Granite 4 series of models.
- Support Mamba and Hybrid Architecture Models.
- Support Flash Attention3 via HuggingFace kernelhub.
- Support loading MxFP4 Quantized models - Models in MxFP4 need to be dequantized before training as MxFP4 training is not supported in HuggingFace.
- Fixed major peformance bug which caused memory usage to double in few cases #592
- Support Alora directly from PEFT upstream.
Data Preprocessor changes
- Supports passing chat template as a
.jinjafile inside data config. - Improved documentation and various bug fixes.
Additional Changes
- Supports experimental Online Data Mixing Plugin as presented in Pytorch Conference'25 by IBM Research Team @kmehant @romitjain @seshapad
List of Changes
- fix: Update offline-data-preprocessing.md by @dushyantbehl in #589
- fix: use default dataprocessor if one is not provided by @dushyantbehl in #590
- fix: data processing job exiting with failure even though successful by @dushyantbehl in #593
- fix: Disable caching in transformers via
use_cacheflag to avoid unncessary memory overhead by @romitjain in #592 - fix: Rank is set to zero by default by @seshapad in #594
- chore: upgrade trl by @dushyantbehl in #601
- chore: add changes to support Granite 4 models by @YashasviChaurasia in #599
- feat: Support gpt-oss class of models with flash attention 3 support by @dushyantbehl in #603
- feat: Restructure README by @dushyantbehl in #598
- fix: add default optim arg in training arg by @YashasviChaurasia in #607
- fix: fix multiple typos by @dushyantbehl in #610
- fix: subclass Lora config from upstream peft.LoraConfig by @romitjain in #609
- fix: update fms-accel by @YashasviChaurasia in #608
- docs: Update advanced-data-preprocessing.md by @dushyantbehl in #613
- feat: add ckpt conversion script fp32-bf16 by @YashasviChaurasia in #614
- feat: Allow chat template to be specified via a path in data config. by @YashasviChaurasia in #615
- feat: add online data mixing plugin by @kmehant in #612
- feat: Adopt resumption feature of online data mixing by @kmehant in #617
- feat: alora migration to peft by @YashasviChaurasia in #618
- feat: ensure fms-acceleration callbacks run before TrainerController by @YashasviChaurasia in #620
- fix: add on_init_end before adding tc callback by @YashasviChaurasia in #621
- fix: add hf compatible path update by @YashasviChaurasia in #622
- fix: avoid updating path kwarg by @YashasviChaurasia in #624
- fix: directly save final ckpt in save_model_dir by @YashasviChaurasia in #626
- fix: Add free up disk space to gh runners by @dushyantbehl in #628
- fix: image build failure due to flash attn by @dushyantbehl in #629
- chore: Merge set of changes for release v3.1.0 by @dushyantbehl in #630
- chore: tag peft to a rc instead of commit tag by @dushyantbehl in #631
- fix: release pypi build failure due to peft tagging to a commit by @dushyantbehl in #632
- fix: package version config bug by @dushyantbehl in #635
- chore: merge pypi fixes for the release by @dushyantbehl in #637
New Contributors
- @romitjain made their first contribution in #592
Full Changelog: v3.0.0...v3.1.0
v3.0.0
Image: quay.io/modh/fms-hf-tuning:v3.0.0
Summary of Changes
Activated LoRA Support
- Support for Activated LoRA model tuning
- Usage is very similar to standard LoRA, with the key difference that an
invocation_stringmust be specified - Available by setting
--peft_methodtoalora - Inference with aLoRA models requires insuring that the invocation string is present in the input
Data Preprocessor Changes
- Breaking Changes to the data preprocessor interface, now utilizing conventional handler and parameter names from HF datasets in data configs
renameandretainare now their own data handlers, not data config parameters- Add flexible train/test dataset splitting by using the
splitparameter in data configs - Merge offline data preprocessor script into main library, can now only preprocess data using
--do_dataprocessing_only
Dependency Updates
peftfrom <0.14 to <0.15.2flash-attnfrom <3.0 to <2.8acceleratefrom <1.1 to <1.7transformersfrom <4.51 to <=4.54.4torchfrom <2.5 to <2.7
Additional Changes
- Updates to tracker framework, additon of ClearML tracker
What's Changed
- docs: add instructions on how to correctly specify the chat template by @HarikrishnanBalagopal in #549
- feat: Data Handling v3 (Breaking change for data config interface) by @dushyantbehl in #494
- docs: Update model architecture in README by @aluu317 in #550
- fix: issues related to providing 2 datasets with diff types by @HarikrishnanBalagopal in #554
- docs: Added gradient checkpointing to docs by @Luka-D in #552
- feat: Add ALoRA support by @kgreenewald in #513
- feat: make activated LoRA an optional flag in the Dockerfile by @HarikrishnanBalagopal in #555
- fix: saving logic for alora by @kmehant in #559
- fix: decouple and update concatenate_dataset functionality from load_dataset by @YashasviChaurasia in #557
- chore: Upgrade transformers, torch, and accelerate version by @Akash-Nayak in #561
- build(deps): Update peft requirement from <=0.14,>=0.8.0 to >=0.8.0,<=0.15.2 by @dependabot[bot] in #556
- feat: add train_test_split functionality via dataconfig by @YashasviChaurasia in #560
- fix: docs and minor code by @dushyantbehl in #570
- fix: Update flash-attn version constraint to <2.8 for compatibility by @Akash-Nayak in #571
- feat: update tracking framework to make it more flexible. Add clearml tracker by @dushyantbehl in #568
- fix: Remove the additional closing curly bracket by @Akash-Nayak in #572
- fix: Add ENABLE_MLFLOW build argument to Dockerfile to control MLflow integration by @Akash-Nayak in #573
- fix: typo and enhanced warning message for jinja and chat template rendering by @dushyantbehl in #574
- fix: trackers should be used only on main process by @dushyantbehl in #578
- fix: Decouple offline data processing from collators by @dushyantbehl in #579
- feat: merge offline processing into the main library by @dushyantbehl in #580
- fix: change logging level to info and print flat arguments by @dushyantbehl in #582
- feat: add error handling for split dataset feat by @YashasviChaurasia in #581
- feat: TC Event to handle final checkpoint by @seshapad in #558
- fix: Restructure and rewite sampling logic to be compatible with split. by @dushyantbehl in #587
- chore(release): merge set of changes for v3.0.0 by @willmj in #588
New Contributors
- @kgreenewald made their first contribution in #513
- @Akash-Nayak made their first contribution in #561
Full Changelog: v2.8.2...v3.0.0
v3.0.0-rc.2
What's Changed
- docs: add instructions on how to correctly specify the chat template by @HarikrishnanBalagopal in #549
- feat: Data Handling v3 (Breaking change for data config interface) by @dushyantbehl in #494
- docs: Update model architecture in README by @aluu317 in #550
- fix: issues related to providing 2 datasets with diff types by @HarikrishnanBalagopal in #554
- docs: Added gradient checkpointing to docs by @Luka-D in #552
- feat: Add ALoRA support by @kgreenewald in #513
- feat: make activated LoRA an optional flag in the Dockerfile by @HarikrishnanBalagopal in #555
- fix: saving logic for alora by @kmehant in #559
- fix: decouple and update concatenate_dataset functionality from load_dataset by @YashasviChaurasia in #557
- chore: Upgrade transformers, torch, and accelerate version by @Akash-Nayak in #561
- build(deps): Update peft requirement from <=0.14,>=0.8.0 to >=0.8.0,<=0.15.2 by @dependabot[bot] in #556
- feat: add train_test_split functionality via dataconfig by @YashasviChaurasia in #560
- fix: docs and minor code by @dushyantbehl in #570
- fix: Update flash-attn version constraint to <2.8 for compatibility by @Akash-Nayak in #571
- feat: update tracking framework to make it more flexible. Add clearml tracker by @dushyantbehl in #568
- fix: Remove the additional closing curly bracket by @Akash-Nayak in #572
- fix: Add ENABLE_MLFLOW build argument to Dockerfile to control MLflow integration by @Akash-Nayak in #573
- fix: typo and enhanced warning message for jinja and chat template rendering by @dushyantbehl in #574
- fix: trackers should be used only on main process by @dushyantbehl in #578
- fix: Decouple offline data processing from collators by @dushyantbehl in #579
- feat: merge offline processing into the main library by @dushyantbehl in #580
- fix: change logging level to info and print flat arguments by @dushyantbehl in #582
- feat: add error handling for split dataset feat by @YashasviChaurasia in #581
- feat: TC Event to handle final checkpoint by @seshapad in #558
- fix: Restructure and rewite sampling logic to be compatible with split. by @dushyantbehl in #587
New Contributors
- @kgreenewald made their first contribution in #513
- @Akash-Nayak made their first contribution in #561
Full Changelog: v2.8.2...v3.0.0-rc.2
v3.0.0-rc.1
What's Changed
- docs: add instructions on how to correctly specify the chat template by @HarikrishnanBalagopal in #549
- feat: Data Handling v3 (Breaking change for data config interface) by @dushyantbehl in #494
- docs: Update model architecture in README by @aluu317 in #550
- fix: issues related to providing 2 datasets with diff types by @HarikrishnanBalagopal in #554
- docs: Added gradient checkpointing to docs by @Luka-D in #552
- feat: Add ALoRA support by @kgreenewald in #513
- feat: make activated LoRA an optional flag in the Dockerfile by @HarikrishnanBalagopal in #555
- fix: saving logic for alora by @kmehant in #559
- fix: decouple and update concatenate_dataset functionality from load_dataset by @YashasviChaurasia in #557
- chore: Upgrade transformers, torch, and accelerate version by @Akash-Nayak in #561
- build(deps): Update peft requirement from <=0.14,>=0.8.0 to >=0.8.0,<=0.15.2 by @dependabot[bot] in #556
- feat: add train_test_split functionality via dataconfig by @YashasviChaurasia in #560
- fix: docs and minor code by @dushyantbehl in #570
- fix: Update flash-attn version constraint to <2.8 for compatibility by @Akash-Nayak in #571
- feat: update tracking framework to make it more flexible. Add clearml tracker by @dushyantbehl in #568
- fix: Remove the additional closing curly bracket by @Akash-Nayak in #572
- fix: Add ENABLE_MLFLOW build argument to Dockerfile to control MLflow integration by @Akash-Nayak in #573
- fix: typo and enhanced warning message for jinja and chat template rendering by @dushyantbehl in #574
- fix: trackers should be used only on main process by @dushyantbehl in #578
- fix: Decouple offline data processing from collators by @dushyantbehl in #579
- feat: merge offline processing into the main library by @dushyantbehl in #580
- fix: change logging level to info and print flat arguments by @dushyantbehl in #582
- feat: add error handling for split dataset feat by @YashasviChaurasia in #581
New Contributors
- @kgreenewald made their first contribution in #513
- @Akash-Nayak made their first contribution in #561
Full Changelog: v2.8.2-rc.1...v3.0.0-rc.1
v2.8.2
Image: quay.io/modh/fms-hf-tuning:v2.8.2
Summary of Changes
Vision Model Tuning Support
- Added support for full and LoRA tuning of vision-language models (granite vision, llama vision, llava) using a chat-style image+text dataset format, with image and text field customization and model-specific configurations.
- For full usage details, see README.md.
- For vision model tuning, the
--dataset_image_fieldflag has been added to select the column which contains images. - For vision model tuning, set
"--gradient_checkpointing_kwargs": {"use_reentrant": false}as well as"accelerate_launch_args": { "fsdp_transformer_layer_cls_to_wrap": "<DecoderLayer>"}based on the model's architecture.
ScatterMoE Updates
- With the latest release of fms-acceleration, ScatterMoE for LoRA has been enabled for attention layers.
- ScatterMoE has been added to tuning image by default, and no longer requires an additional install.
- New interface for
--fast_moeconfig now accepts either int or bool.- If bool is passed, expert shards are set to one and toggles MoE kernels.
- If int is passed, MoE kernels is turned on and expert shards are set to the value passed.
Data PreProcessor
- Un-escape templates and strings are now passed correctly through cli.
- Support for selecting a specific field from the dataset that contains multi-turn dialogue data by specifying
--conversation_column. - Add OpenInstruct style data handler for chat template with masking outside of data collator:
tokenize_and_apply_chat_template_with_masking. - Allow specifying the chat template as base64 to avoid escaping and templating issues.
Dependency Updates
trlfrom <0.15 to <0.18pillow<0.12 addedtransformerslocked at <4.51
Additional Changes
- Experimental support for sum loss trainer.
What's Changed
- fix: Use main process first instead of local_main_process_first by @dushyantbehl in #506
- feat: Enable Packing for pretokenised dataset by @dushyantbehl in #468
- fix: save model dir hf moe checkpoint by @willmj in #503
- build(deps): install mamba_ssm from package instead of github by @anhuong in #507
- docs: Offline Data Preprocessing documentation by @Abhishek-TAMU in #502
- docs: add flash debug steps to docs by @kmehant in #510
- feat: new interface for fast_moe (non breaking) by @kmehant in #514
- fix: additional special tokens being replaced by @dushyantbehl in #517
- fix: downgrade transformers due to breaking change in loading from checkpoint. by @dushyantbehl in #518
- fix: (data pre-process) un-escape templates and strings passed on cli correctly by @ChanderG in #493
- refactor: refactor set special tokens function and add unit tests. by @Luka-D in #475
- feat: Add OI style data handler for chat template usecase by @dushyantbehl in #519
- fix: set use_cache=false while model loading to prevent graph break by @SilverSoldier in #516
- docs: update the
--data_configflag to--data_config_pathby @HarikrishnanBalagopal in #522 - feat: expose conversation_column in data_args by @YashasviChaurasia in #521
- feat: [experimental] enable sum loss trainer by @dushyantbehl in #520
- fix: add init file and header by @dushyantbehl in #525
- fix: multipack and streaming are incompatible by @willmj in #526
- build(image): add fast_moe into fms-hf-tuning image by @willmj in #512
- fix: change the default model_name_or_path from "facebook/opt-125m" to None by @HarikrishnanBalagopal in #528
- build: Upgrade TRL version from 0.14 to 0.16 by @Abhishek-TAMU in #527
- fix: Update CODEOWNERS by @Ssukriti in #524
- chore: Update CONTRIBUTING.md by @Ssukriti in #501
- feat: support loading vision model by @anhuong in #451
- feat: Enable LoRA saving only for non MoE linear layers training with kernels. by @willmj in #530
- fix:Add tiny granite vision model and update ReadMe for vision model support. by @Abhishek-TAMU in #533
- ci: use --no-build-isolation flag for mamba by @aluu317 in #538
- fix: yaml parsing converts floats like 1.0 into integer 1 causing assertion failure by @HarikrishnanBalagopal in #536
- build(deps): upgrade trl by @willmj in #540
- chore(release): merge set of changes for v2.8.0 by @willmj in #541
- feat: allow specifying the chat template as base64 to avoid weird escaping and templating issues by @HarikrishnanBalagopal in #534
- ci: Install dnf-plugins-core in cuda-base stage by @aluu317 in #542
- chore(release): merge set of changes for v2.8.1 by @willmj in #543
- fix: incorrect check on fast moe activation by @kmehant in #544
- docs: add addt vision model support by @anhuong in #546
- fix: fastmoe lora
all-linearexcludingrouterby @willmj in #545 - chore(release): merge set of changes for v2.8.2 by @willmj in #547
New Contributors
- @ChanderG made their first contribution in #493
- @SilverSoldier made their first contribution in #516
Full Changelog: v2.7.1...v2.8.2
v2.8.2-rc.1
Full Changelog: v2.8.1...v2.8.2-rc.1
v2.8.1
Recommend to use v2.8.2, which includes a bug fix for LoRA tuning. To view set of changes see v2.8.2.
Full Changelog: v2.7.1...v2.8.1
v2.8.1-rc.1
What's Changed
- feat: allow specifying the chat template as base64 to avoid weird escaping and templating issues by @HarikrishnanBalagopal in #534
- ci: Install dnf-plugins-core in cuda-base stage by @aluu317 in #542
Full Changelog: v2.8.0...v2.8.1-rc.1