08 Dec 09:36

dushyantbehl

890725c

v3.1.1-rc1 Pre-release

Pre-release

What's Changed

fix: add setup scm version config to pyproject for correct version tag while building wheel. by @dushyantbehl in #634
fix: pypi classifier tag by @YashasviChaurasia in #636
chore: update peft version to v0.18.0 by @YashasviChaurasia in #638
fix: typo in log message about validation dataaset by @HarikrishnanBalagopal in #640
feat: Adds embed_tokens, lm_head as trainable for vocab expansion in peft and enables tying of adapters by @romitjain in #625
fix: add global step number in hf moe chpt by @YashasviChaurasia in #643
fix: ODM dispatch by @romitjain in #647
fix: nvcr dockerfile build faliing due to flash attn. by @dushyantbehl in #646
feat: ODM without categories integration with fms-accel by @romitjain in #641
fix: Updated ODM defaults by @romitjain in #648

Full Changelog: v3.1.0...v3.1.1-rc1

Contributors

dushyantbehl, romitjain, and 2 other contributors

Assets 2

11 Nov 17:56

dushyantbehl

v3.1.0

9aca213

v3.1.0 Latest

Latest

Image: quay.io/modh/fms-hf-tuning:v3.1.0

Summary

Support GPT OSS class of models.
Support Granite 4 series of models.
Support Mamba and Hybrid Architecture Models.
Support Flash Attention3 via HuggingFace kernelhub.
Support loading MxFP4 Quantized models - Models in MxFP4 need to be dequantized before training as MxFP4 training is not supported in HuggingFace.
Fixed major peformance bug which caused memory usage to double in few cases #592
Support Alora directly from PEFT upstream.

Data Preprocessor changes

Supports passing chat template as a .jinja file inside data config.
Improved documentation and various bug fixes.

Additional Changes

Supports experimental Online Data Mixing Plugin as presented in Pytorch Conference'25 by IBM Research Team @kmehant @romitjain @seshapad

List of Changes

fix: Update offline-data-preprocessing.md by @dushyantbehl in #589
fix: use default dataprocessor if one is not provided by @dushyantbehl in #590
fix: data processing job exiting with failure even though successful by @dushyantbehl in #593
fix: Disable caching in transformers via use_cache flag to avoid unncessary memory overhead by @romitjain in #592
fix: Rank is set to zero by default by @seshapad in #594
chore: upgrade trl by @dushyantbehl in #601
chore: add changes to support Granite 4 models by @YashasviChaurasia in #599
feat: Support gpt-oss class of models with flash attention 3 support by @dushyantbehl in #603
feat: Restructure README by @dushyantbehl in #598
fix: add default optim arg in training arg by @YashasviChaurasia in #607
fix: fix multiple typos by @dushyantbehl in #610
fix: subclass Lora config from upstream peft.LoraConfig by @romitjain in #609
fix: update fms-accel by @YashasviChaurasia in #608
docs: Update advanced-data-preprocessing.md by @dushyantbehl in #613
feat: add ckpt conversion script fp32-bf16 by @YashasviChaurasia in #614
feat: Allow chat template to be specified via a path in data config. by @YashasviChaurasia in #615
feat: add online data mixing plugin by @kmehant in #612
feat: Adopt resumption feature of online data mixing by @kmehant in #617
feat: alora migration to peft by @YashasviChaurasia in #618
feat: ensure fms-acceleration callbacks run before TrainerController by @YashasviChaurasia in #620
fix: add on_init_end before adding tc callback by @YashasviChaurasia in #621
fix: add hf compatible path update by @YashasviChaurasia in #622
fix: avoid updating path kwarg by @YashasviChaurasia in #624
fix: directly save final ckpt in save_model_dir by @YashasviChaurasia in #626
fix: Add free up disk space to gh runners by @dushyantbehl in #628
fix: image build failure due to flash attn by @dushyantbehl in #629
chore: Merge set of changes for release v3.1.0 by @dushyantbehl in #630
chore: tag peft to a rc instead of commit tag by @dushyantbehl in #631
fix: release pypi build failure due to peft tagging to a commit by @dushyantbehl in #632
fix: package version config bug by @dushyantbehl in #635
chore: merge pypi fixes for the release by @dushyantbehl in #637

New Contributors

@romitjain made their first contribution in #592

Full Changelog: v3.0.0...v3.1.0

Contributors

dushyantbehl, romitjain, and 3 other contributors

Assets 2

22 Jul 13:33

willmj

v3.0.0

d8cb1cb

v3.0.0

Image: quay.io/modh/fms-hf-tuning:v3.0.0

Summary of Changes

Activated LoRA Support

Support for Activated LoRA model tuning
Usage is very similar to standard LoRA, with the key difference that an invocation_string must be specified
Available by setting --peft_method to alora
Inference with aLoRA models requires insuring that the invocation string is present in the input

Data Preprocessor Changes

Breaking Changes to the data preprocessor interface, now utilizing conventional handler and parameter names from HF datasets in data configs
rename and retain are now their own data handlers, not data config parameters
Add flexible train/test dataset splitting by using the split parameter in data configs
Merge offline data preprocessor script into main library, can now only preprocess data using --do_dataprocessing_only

Dependency Updates

peft from <0.14 to <0.15.2
flash-attn from <3.0 to <2.8
accelerate from <1.1 to <1.7
transformers from <4.51 to <=4.54.4
torch from <2.5 to <2.7

Additional Changes

Updates to tracker framework, additon of ClearML tracker

What's Changed

docs: add instructions on how to correctly specify the chat template by @HarikrishnanBalagopal in #549
feat: Data Handling v3 (Breaking change for data config interface) by @dushyantbehl in #494
docs: Update model architecture in README by @aluu317 in #550
fix: issues related to providing 2 datasets with diff types by @HarikrishnanBalagopal in #554
docs: Added gradient checkpointing to docs by @Luka-D in #552
feat: Add ALoRA support by @kgreenewald in #513
feat: make activated LoRA an optional flag in the Dockerfile by @HarikrishnanBalagopal in #555
fix: saving logic for alora by @kmehant in #559
fix: decouple and update concatenate_dataset functionality from load_dataset by @YashasviChaurasia in #557
chore: Upgrade transformers, torch, and accelerate version by @Akash-Nayak in #561
build(deps): Update peft requirement from <=0.14,>=0.8.0 to >=0.8.0,<=0.15.2 by @dependabot[bot] in #556
feat: add train_test_split functionality via dataconfig by @YashasviChaurasia in #560
fix: docs and minor code by @dushyantbehl in #570
fix: Update flash-attn version constraint to <2.8 for compatibility by @Akash-Nayak in #571
feat: update tracking framework to make it more flexible. Add clearml tracker by @dushyantbehl in #568
fix: Remove the additional closing curly bracket by @Akash-Nayak in #572
fix: Add ENABLE_MLFLOW build argument to Dockerfile to control MLflow integration by @Akash-Nayak in #573
fix: typo and enhanced warning message for jinja and chat template rendering by @dushyantbehl in #574
fix: trackers should be used only on main process by @dushyantbehl in #578
fix: Decouple offline data processing from collators by @dushyantbehl in #579
feat: merge offline processing into the main library by @dushyantbehl in #580
fix: change logging level to info and print flat arguments by @dushyantbehl in #582
feat: add error handling for split dataset feat by @YashasviChaurasia in #581
feat: TC Event to handle final checkpoint by @seshapad in #558
fix: Restructure and rewite sampling logic to be compatible with split. by @dushyantbehl in #587
chore(release): merge set of changes for v3.0.0 by @willmj in #588

New Contributors

@kgreenewald made their first contribution in #513
@Akash-Nayak made their first contribution in #561

Full Changelog: v2.8.2...v3.0.0

Contributors

aluu317, dushyantbehl, and 9 other contributors

Assets 2

21 Jul 15:43

willmj

v3.0.0-rc.2

8c16f2d

v3.0.0-rc.2 Pre-release

Pre-release

What's Changed

docs: add instructions on how to correctly specify the chat template by @HarikrishnanBalagopal in #549
feat: Data Handling v3 (Breaking change for data config interface) by @dushyantbehl in #494
docs: Update model architecture in README by @aluu317 in #550
fix: issues related to providing 2 datasets with diff types by @HarikrishnanBalagopal in #554
docs: Added gradient checkpointing to docs by @Luka-D in #552
feat: Add ALoRA support by @kgreenewald in #513
feat: make activated LoRA an optional flag in the Dockerfile by @HarikrishnanBalagopal in #555
fix: saving logic for alora by @kmehant in #559
fix: decouple and update concatenate_dataset functionality from load_dataset by @YashasviChaurasia in #557
chore: Upgrade transformers, torch, and accelerate version by @Akash-Nayak in #561
build(deps): Update peft requirement from <=0.14,>=0.8.0 to >=0.8.0,<=0.15.2 by @dependabot[bot] in #556
feat: add train_test_split functionality via dataconfig by @YashasviChaurasia in #560
fix: docs and minor code by @dushyantbehl in #570
fix: Update flash-attn version constraint to <2.8 for compatibility by @Akash-Nayak in #571
feat: update tracking framework to make it more flexible. Add clearml tracker by @dushyantbehl in #568
fix: Remove the additional closing curly bracket by @Akash-Nayak in #572
fix: Add ENABLE_MLFLOW build argument to Dockerfile to control MLflow integration by @Akash-Nayak in #573
fix: typo and enhanced warning message for jinja and chat template rendering by @dushyantbehl in #574
fix: trackers should be used only on main process by @dushyantbehl in #578
fix: Decouple offline data processing from collators by @dushyantbehl in #579
feat: merge offline processing into the main library by @dushyantbehl in #580
fix: change logging level to info and print flat arguments by @dushyantbehl in #582
feat: add error handling for split dataset feat by @YashasviChaurasia in #581
feat: TC Event to handle final checkpoint by @seshapad in #558
fix: Restructure and rewite sampling logic to be compatible with split. by @dushyantbehl in #587

New Contributors

@kgreenewald made their first contribution in #513
@Akash-Nayak made their first contribution in #561

Full Changelog: v2.8.2...v3.0.0-rc.2

Contributors

aluu317, dushyantbehl, and 8 other contributors

Assets 2

11 Jul 18:10

willmj

v3.0.0-rc.1

20185d1

v3.0.0-rc.1 Pre-release

Pre-release

What's Changed

docs: add instructions on how to correctly specify the chat template by @HarikrishnanBalagopal in #549
feat: Data Handling v3 (Breaking change for data config interface) by @dushyantbehl in #494
docs: Update model architecture in README by @aluu317 in #550
fix: issues related to providing 2 datasets with diff types by @HarikrishnanBalagopal in #554
docs: Added gradient checkpointing to docs by @Luka-D in #552
feat: Add ALoRA support by @kgreenewald in #513
feat: make activated LoRA an optional flag in the Dockerfile by @HarikrishnanBalagopal in #555
fix: saving logic for alora by @kmehant in #559
fix: decouple and update concatenate_dataset functionality from load_dataset by @YashasviChaurasia in #557
chore: Upgrade transformers, torch, and accelerate version by @Akash-Nayak in #561
build(deps): Update peft requirement from <=0.14,>=0.8.0 to >=0.8.0,<=0.15.2 by @dependabot[bot] in #556
feat: add train_test_split functionality via dataconfig by @YashasviChaurasia in #560
fix: docs and minor code by @dushyantbehl in #570
fix: Update flash-attn version constraint to <2.8 for compatibility by @Akash-Nayak in #571
feat: update tracking framework to make it more flexible. Add clearml tracker by @dushyantbehl in #568
fix: Remove the additional closing curly bracket by @Akash-Nayak in #572
fix: Add ENABLE_MLFLOW build argument to Dockerfile to control MLflow integration by @Akash-Nayak in #573
fix: typo and enhanced warning message for jinja and chat template rendering by @dushyantbehl in #574
fix: trackers should be used only on main process by @dushyantbehl in #578
fix: Decouple offline data processing from collators by @dushyantbehl in #579
feat: merge offline processing into the main library by @dushyantbehl in #580
fix: change logging level to info and print flat arguments by @dushyantbehl in #582
feat: add error handling for split dataset feat by @YashasviChaurasia in #581

New Contributors

@kgreenewald made their first contribution in #513
@Akash-Nayak made their first contribution in #561

Full Changelog: v2.8.2-rc.1...v3.0.0-rc.1

Contributors

aluu317, dushyantbehl, and 7 other contributors

Assets 2

30 Apr 20:03

willmj

v2.8.2

ad594c7

v2.8.2

Image: quay.io/modh/fms-hf-tuning:v2.8.2

Summary of Changes

Vision Model Tuning Support

Added support for full and LoRA tuning of vision-language models (granite vision, llama vision, llava) using a chat-style image+text dataset format, with image and text field customization and model-specific configurations.
- For full usage details, see README.md.
For vision model tuning, the --dataset_image_field flag has been added to select the column which contains images.
For vision model tuning, set "--gradient_checkpointing_kwargs": {"use_reentrant": false} as well as "accelerate_launch_args": { "fsdp_transformer_layer_cls_to_wrap": "<DecoderLayer>"} based on the model's architecture.

ScatterMoE Updates

With the latest release of fms-acceleration, ScatterMoE for LoRA has been enabled for attention layers.
ScatterMoE has been added to tuning image by default, and no longer requires an additional install.
New interface for --fast_moe config now accepts either int or bool.
- If bool is passed, expert shards are set to one and toggles MoE kernels.
- If int is passed, MoE kernels is turned on and expert shards are set to the value passed.

Data PreProcessor

Un-escape templates and strings are now passed correctly through cli.
Support for selecting a specific field from the dataset that contains multi-turn dialogue data by specifying --conversation_column.
Add OpenInstruct style data handler for chat template with masking outside of data collator: tokenize_and_apply_chat_template_with_masking.
Allow specifying the chat template as base64 to avoid escaping and templating issues.

Dependency Updates

trl from <0.15 to <0.18
pillow <0.12 added
transformers locked at <4.51

Additional Changes

Experimental support for sum loss trainer.

What's Changed

fix: Use main process first instead of local_main_process_first by @dushyantbehl in #506
feat: Enable Packing for pretokenised dataset by @dushyantbehl in #468
fix: save model dir hf moe checkpoint by @willmj in #503
build(deps): install mamba_ssm from package instead of github by @anhuong in #507
docs: Offline Data Preprocessing documentation by @Abhishek-TAMU in #502
docs: add flash debug steps to docs by @kmehant in #510
feat: new interface for fast_moe (non breaking) by @kmehant in #514
fix: additional special tokens being replaced by @dushyantbehl in #517
fix: downgrade transformers due to breaking change in loading from checkpoint. by @dushyantbehl in #518
fix: (data pre-process) un-escape templates and strings passed on cli correctly by @ChanderG in #493
refactor: refactor set special tokens function and add unit tests. by @Luka-D in #475
feat: Add OI style data handler for chat template usecase by @dushyantbehl in #519
fix: set use_cache=false while model loading to prevent graph break by @SilverSoldier in #516
docs: update the --data_config flag to --data_config_path by @HarikrishnanBalagopal in #522
feat: expose conversation_column in data_args by @YashasviChaurasia in #521
feat: [experimental] enable sum loss trainer by @dushyantbehl in #520
fix: add init file and header by @dushyantbehl in #525
fix: multipack and streaming are incompatible by @willmj in #526
build(image): add fast_moe into fms-hf-tuning image by @willmj in #512
fix: change the default model_name_or_path from "facebook/opt-125m" to None by @HarikrishnanBalagopal in #528
build: Upgrade TRL version from 0.14 to 0.16 by @Abhishek-TAMU in #527
fix: Update CODEOWNERS by @Ssukriti in #524
chore: Update CONTRIBUTING.md by @Ssukriti in #501
feat: support loading vision model by @anhuong in #451
feat: Enable LoRA saving only for non MoE linear layers training with kernels. by @willmj in #530
fix:Add tiny granite vision model and update ReadMe for vision model support. by @Abhishek-TAMU in #533
ci: use --no-build-isolation flag for mamba by @aluu317 in #538
fix: yaml parsing converts floats like 1.0 into integer 1 causing assertion failure by @HarikrishnanBalagopal in #536
build(deps): upgrade trl by @willmj in #540
chore(release): merge set of changes for v2.8.0 by @willmj in #541
feat: allow specifying the chat template as base64 to avoid weird escaping and templating issues by @HarikrishnanBalagopal in #534
ci: Install dnf-plugins-core in cuda-base stage by @aluu317 in #542
chore(release): merge set of changes for v2.8.1 by @willmj in #543
fix: incorrect check on fast moe activation by @kmehant in #544
docs: add addt vision model support by @anhuong in #546
fix: fastmoe lora all-linear excluding router by @willmj in #545
chore(release): merge set of changes for v2.8.2 by @willmj in #547

New Contributors

@ChanderG made their first contribution in #493
@SilverSoldier made their first contribution in #516

Full Changelog: v2.7.1...v2.8.2

Contributors

aluu317, dushyantbehl, and 10 other contributors

Assets 2

30 Apr 18:52

willmj

v2.8.2-rc.1

dc77c63

v2.8.2-rc.1 Pre-release

Pre-release

Full Changelog: v2.8.1...v2.8.2-rc.1

Assets 2

28 Apr 22:00

willmj

v2.8.1

4fa54e1

v2.8.1

Recommend to use v2.8.2, which includes a bug fix for LoRA tuning. To view set of changes see v2.8.2.

Full Changelog: v2.7.1...v2.8.1

Assets 2

28 Apr 21:31

willmj

v2.8.1-rc.1

3c91290

v2.8.1-rc.1 Pre-release

Pre-release

What's Changed

feat: allow specifying the chat template as base64 to avoid weird escaping and templating issues by @HarikrishnanBalagopal in #534
ci: Install dnf-plugins-core in cuda-base stage by @aluu317 in #542

Full Changelog: v2.8.0...v2.8.1-rc.1

Contributors

aluu317 and HarikrishnanBalagopal

Assets 2

28 Apr 18:07

willmj

v2.8.0

3c17e8e

v2.8.0

Recommend to use v2.8.2. Needed additional dependency update for image upload. To view set of changes see v2.8.1.

Assets 2

Releases: foundation-model-stack/fms-hf-tuning

v3.1.1-rc1

What's Changed

Contributors

Uh oh!

v3.1.0

Summary

Data Preprocessor changes

Additional Changes

List of Changes

New Contributors

Contributors

Uh oh!

v3.0.0

Summary of Changes

Activated LoRA Support

Data Preprocessor Changes

Dependency Updates

Additional Changes

What's Changed

New Contributors

Contributors

Uh oh!

v3.0.0-rc.2

What's Changed

New Contributors

Contributors

Uh oh!

v3.0.0-rc.1

What's Changed

New Contributors

Contributors

Uh oh!

v2.8.2

Summary of Changes

Vision Model Tuning Support

ScatterMoE Updates

Data PreProcessor

Dependency Updates

Additional Changes

What's Changed

New Contributors

Contributors

Uh oh!

v2.8.2-rc.1

Uh oh!

v2.8.1

Uh oh!

v2.8.1-rc.1

What's Changed

Contributors

Uh oh!

v2.8.0

Uh oh!