v2.8.2
Image: quay.io/modh/fms-hf-tuning:v2.8.2
Summary of Changes
Vision Model Tuning Support
- Added support for full and LoRA tuning of vision-language models (granite vision, llama vision, llava) using a chat-style image+text dataset format, with image and text field customization and model-specific configurations.
- For full usage details, see README.md.
- For vision model tuning, the
--dataset_image_fieldflag has been added to select the column which contains images. - For vision model tuning, set
"--gradient_checkpointing_kwargs": {"use_reentrant": false}as well as"accelerate_launch_args": { "fsdp_transformer_layer_cls_to_wrap": "<DecoderLayer>"}based on the model's architecture.
ScatterMoE Updates
- With the latest release of fms-acceleration, ScatterMoE for LoRA has been enabled for attention layers.
- ScatterMoE has been added to tuning image by default, and no longer requires an additional install.
- New interface for
--fast_moeconfig now accepts either int or bool.- If bool is passed, expert shards are set to one and toggles MoE kernels.
- If int is passed, MoE kernels is turned on and expert shards are set to the value passed.
Data PreProcessor
- Un-escape templates and strings are now passed correctly through cli.
- Support for selecting a specific field from the dataset that contains multi-turn dialogue data by specifying
--conversation_column. - Add OpenInstruct style data handler for chat template with masking outside of data collator:
tokenize_and_apply_chat_template_with_masking. - Allow specifying the chat template as base64 to avoid escaping and templating issues.
Dependency Updates
trlfrom <0.15 to <0.18pillow<0.12 addedtransformerslocked at <4.51
Additional Changes
- Experimental support for sum loss trainer.
What's Changed
- fix: Use main process first instead of local_main_process_first by @dushyantbehl in #506
- feat: Enable Packing for pretokenised dataset by @dushyantbehl in #468
- fix: save model dir hf moe checkpoint by @willmj in #503
- build(deps): install mamba_ssm from package instead of github by @anhuong in #507
- docs: Offline Data Preprocessing documentation by @Abhishek-TAMU in #502
- docs: add flash debug steps to docs by @kmehant in #510
- feat: new interface for fast_moe (non breaking) by @kmehant in #514
- fix: additional special tokens being replaced by @dushyantbehl in #517
- fix: downgrade transformers due to breaking change in loading from checkpoint. by @dushyantbehl in #518
- fix: (data pre-process) un-escape templates and strings passed on cli correctly by @ChanderG in #493
- refactor: refactor set special tokens function and add unit tests. by @Luka-D in #475
- feat: Add OI style data handler for chat template usecase by @dushyantbehl in #519
- fix: set use_cache=false while model loading to prevent graph break by @SilverSoldier in #516
- docs: update the
--data_configflag to--data_config_pathby @HarikrishnanBalagopal in #522 - feat: expose conversation_column in data_args by @YashasviChaurasia in #521
- feat: [experimental] enable sum loss trainer by @dushyantbehl in #520
- fix: add init file and header by @dushyantbehl in #525
- fix: multipack and streaming are incompatible by @willmj in #526
- build(image): add fast_moe into fms-hf-tuning image by @willmj in #512
- fix: change the default model_name_or_path from "facebook/opt-125m" to None by @HarikrishnanBalagopal in #528
- build: Upgrade TRL version from 0.14 to 0.16 by @Abhishek-TAMU in #527
- fix: Update CODEOWNERS by @Ssukriti in #524
- chore: Update CONTRIBUTING.md by @Ssukriti in #501
- feat: support loading vision model by @anhuong in #451
- feat: Enable LoRA saving only for non MoE linear layers training with kernels. by @willmj in #530
- fix:Add tiny granite vision model and update ReadMe for vision model support. by @Abhishek-TAMU in #533
- ci: use --no-build-isolation flag for mamba by @aluu317 in #538
- fix: yaml parsing converts floats like 1.0 into integer 1 causing assertion failure by @HarikrishnanBalagopal in #536
- build(deps): upgrade trl by @willmj in #540
- chore(release): merge set of changes for v2.8.0 by @willmj in #541
- feat: allow specifying the chat template as base64 to avoid weird escaping and templating issues by @HarikrishnanBalagopal in #534
- ci: Install dnf-plugins-core in cuda-base stage by @aluu317 in #542
- chore(release): merge set of changes for v2.8.1 by @willmj in #543
- fix: incorrect check on fast moe activation by @kmehant in #544
- docs: add addt vision model support by @anhuong in #546
- fix: fastmoe lora
all-linearexcludingrouterby @willmj in #545 - chore(release): merge set of changes for v2.8.2 by @willmj in #547
New Contributors
- @ChanderG made their first contribution in #493
- @SilverSoldier made their first contribution in #516
Full Changelog: v2.7.1...v2.8.2