Releases: instructlab/training
v0.15.1 - Expanded Text-Data VLM / Multi-Modal Training Support
What's Changed
- Fix Gemma 3 SFT training by detecting dual-registered VLM configs by @RobotSail in #695
Full Changelog: v0.15.0...v0.15.1
v0.15.0 - Qwen3.5 VL Model Support
What's New
Features
-
Vision-Language Model (VLM) Support for Text-Only Training (#693)
- Added automatic detection and loading of vision-language models for text-only training
- New
vlm_utils.pymodule with utilities for identifying and extracting CausalLM text backbones from VLM wrappers - Support for two VLM loading strategies: extracting the text backbone when a CausalLM sub-model exists, or direct VLM loading when no CausalLM variant is available
- Improved tokenizer/text-config reconciliation for VLMs where
vocab_sizelives undertext_config
-
Mixed Attention Handling for VLMs (#693)
- Models with
timmvision towers now use per-component attention:eagerfor vision,flash_attention_2orsdpafor text - Automatic SDPA fallback for M-RoPE models (e.g. Qwen3.5 VL) which are incompatible with Flash Attention 2
- Models with
Bug Fixes
-
FSDP Wrap Policy Robustness (#693)
- Fixed
_no_split_modulesresolution to handle models that declare module names for architectures not loaded (e.g. vision blocks when loading only the CausalLM) - FSDP wrap policy now resolves all declared module names against both the wrapper and underlying HF model, filtering out unresolvable entries
- Fixed
-
GPT-OSS Attention Capability Detection (#693)
vllm-flash-attn3is now gated behind a Hopper (SM 9.0+) GPU capability check, falling back toeageron older hardware
Improvements
- Local Mamba Kernel Preference (#693)
- GraniteMoeHybrid models now pre-populate the Hub kernel cache with locally installed
mamba_ssmandcausal_conv1dto avoid PyTorch/CUDA ABI mismatches with Hub-provided kernel builds
- GraniteMoeHybrid models now pre-populate the Hub kernel cache with locally installed
What's Changed
- add support for qwen3.5 vl model by @RobotSail in #693
Full Changelog: v0.14.2...v0.15.0
v0.14.2 - Validation Loss and Transformers V4 Backwards Compatability
What's Changed
- Add backwards compatibility for transformers v4.57 by @Maxusmusti in #684
- Adds Validation Adds validation loss + exposes it in the API by @RobotSail in #685
Full Changelog: v0.14.1...v0.14.2
v0.14.1 - Correct FSDP Config Behavior for Transformers v5
What's Changed
- fix _no_split_modules subscript error for transformers v5 by @Maxusmusti in #683
Full Changelog: v0.14.0...v0.14.1
v0.14.0 - MLflow Support & Transformers v5 Compatibility
What's New
Features
-
MLflow Logging Backend (#680)
- Added
MLflowHandlerclass for logging training metrics to MLflow - New
TrainingArgsfields:mlflow_tracking_uri,mlflow_experiment_name,mlflow_run_name - Added
wandb_project,wandb_entity,wandb_run_namefields for W&B configuration - Added
tensorboard_log_dirfield for configurable TensorBoard log directory - New optional install targets:
requirements-mlflow.txt,requirements-wandb.txt,requirements-tensorboard.txt
- Added
-
Transformers v5 Compatibility (#681)
- Updated tokenizer API calls to use
extra_special_tokensinstead ofadditional_special_tokens - Suppressed verbose httpx HTTP request logs from huggingface_hub
- Updated tokenizer API calls to use
Bug Fixes
- HYBRID_SHARD Failure Fix (#682)
- Added detection for when
world_size < num_devices_per_nodein FSDP configuration - Automatically falls back to
FULL_SHARDwith a warning whenHYBRID_SHARDwould fail
- Added detection for when
Development
- Tox-UV Integration (#676)
- Added
tox-uvas a tox requirement withuv-venv-runner - Updated GitHub workflows to use
uvfor package installation - Replaced
pip installwithuv pip installin CI workflows
- Added
What's Changed
- adds integration for tox-uv and updates workflows to use tox-uv by @RobotSail in #676
- Add transformers v5 compatibility by @Maxusmusti in #681
- Fix HYBRID_SHARD failure when world_size < available GPUs by @rtj1 in #682
- Add MLflow support and expose logging configuration in TrainingArgs by @RobotSail in #680
New Contributors
Files Changed
18 files changed with 482 insertions and 83 deletions:
- Core training modules:
logger.py,config.py,accelerator.py,data_process.py,tokenizer_utils.py,main_ds.py - New requirements files for optional logging backends
- Updated CI workflows and tox configuration
Full Changelog: v0.13.0...v0.14.0
v0.13.0 - Pretraining Support & Optimizer Configuration
What's New
Features
-
Pretraining Data Processing API (#672)
- Added new API for processing pretraining-style datasets
- Documents are now chunked by configurable
block_size - Chunks are treated as independent, fully-unmasked samples
- Updated training loop to ingest pretraining-style datasets
- Includes comprehensive test coverage (
test_pretraining_data_process.py,test_pretraining_mode.py,test_pretraining_sampler.py)
-
AdamW Optimizer Configuration (#674)
- Exposed
weight_decay,betas, andepsparameters in TrainingArgs - Users can now tune AdamW hyperparameters through
run_training()API - Provides more control over optimizer behavior
- Exposed
-
Granite 4 Model Support (#669)
- Added support for Granite 4 models as Mixture of Experts (MoE) models in training
Bug Fixes
-
Process Timing Fix (#675)
- Fixed race condition where process wasn't completed by the time it was read
-
Variable Access Fix (#668)
- Fixed invalid variable access bug
Dependencies
- Build Dependency Update (#670)
- Updated hynek build dependency
Files Changed
17 files changed with 1,642 insertions and 52 deletions:
- Core training modules:
data_process.py,main_ds.py,sampler.py,model.py,config.py - New test suites for pretraining functionality
- Updated README with new capabilities
Full Changelog
All Changes:
- 574f946 Exposes API for processing pretraining data (#672)
- 638a753 fixes bug where process isn't completed by the time the process gets read (#675)
- c495035 Expose AdamW optimizer parameters in training API (#674)
- 3d05302 Handle granite 4 as MoE models in training (#669)
- 781c36f fixes stray invalid variable access bug (#668)
- 529c2f7 bumps hynek build dep (#670)
Full Diff: v0.12.1...v0.13.0
v0.12.1 - Granite 4 support, and adding extended env var and torchrun arg support
What's Changed
- Update requirements-cuda.txt to increase liger-kernel minimum by @Maxusmusti in #659
- Adds mamba-ssm[causal-conv1d] to CUDA requirements by @RobotSail in #663
- Removes Numpy version cap by @RobotSail in #664
- fix(torchrun): Omit empty arguments and correct nproc_per_node type by @szaher in #661
New Contributors
Full Changelog: v0.12.0...v0.12.1
v0.12.0 - GPT-OSS Support
Full fine-tuning now supports gpt-oss models, alongside minor bugfixes to ensure correct loss calculations with higher gradient accumulation.
What's Changed
- Disable workflow runs on forks by default by @fynnsu in #632
- Adding GPT OSS Support by @Maxusmusti in #646
- Update numpy from <2.0 to <2.3 by @Maxusmusti in #656
- Add kernels>0.9.0 to CUDA requirements by @Maxusmusti in #658
Full Changelog: v0.11.1...v0.12.0
v0.11.1
What's Changed
- Add general logging implementation by @fynnsu in #500
- docs: add CI documentation by @nathan-weinberg in #555
- fix: Use default torch timeout for nccl watchdog unless overridden by @booxter in #521
- fix: Fix markdown-lint violations by @booxter in #559
- ci: add 3.12 smoke workflow flavor by @booxter in #535
- adds barriers after checkpoint saving by @JamesKunstle in #566
- ci: Fix smoke failures due to
prenot available in local actions by @booxter in #565 - Checkout correct branch on
pull_request_targettrigger by @fynnsu in #549 - Logging Fixes & Enhancements by @RobotSail in #571
- docs: Remove badge for a no longer existing job by @booxter in #542
- uses
__name__in logging.getLogger by @JamesKunstle in #573 - ci: stop reporting results to slack by @ktdreyer in #574
- CI: Constrain all dependencies; introduce a Monday workflow to update pins by @booxter in #558
- ci: Run jobs on constraints-dev.txt change by @booxter in #580
- chore: update constraints-dev.txt (2025-05-30) by @courtneypacheco in #579
- remove old Deepspeed-native code by @JamesKunstle in #567
- add DCO.txt by @ktdreyer in #588
- ci: Disable dependabot for pip dependencies by @booxter in #587
- feat: refactor main_ds.py (1/n) Model class by @cdoern in #572
- ci: do not require DCO job by @ktdreyer in #595
- 'granite-3.3-2b-instruct' for smoketest; smaller smoke dataset by @JamesKunstle in #590
- fixes unit tests requiring cuda by @JamesKunstle in #586
- chore: update constraints-dev.txt (2025-06-02) by @courtneypacheco in #584
- ci: Cover more test dependencies with pins by @booxter in #581
- ci: Introduce python 3.12 e2e large job flavor by @booxter in #563
- Implicit distributed backend selection by @booxter in #516
- ci: Fix incorrect indent in workflow steps by @booxter in #599
- feat: refactor main_ds.py (2/n) Accelerator class by @cdoern in #594
- chore: update constraints-dev.txt (2025-06-09) by @courtneypacheco in #602
- feat: add medium e2e CI job for each PR by @cdoern in #551
- test: fix e2e target by @cdoern in #610
- chore: update constraints-dev.txt (2025-06-16) by @courtneypacheco in #612
- Remove Dolomite support by @booxter in #616
- Revert "test: fix e2e target" by @bbrowning in #620
- ci: Remove harden-runner steps from jobs by @booxter in #617
- test: disable per-PR test by @cdoern in #631
- fix edge case for qwen3 data processing by @RobotSail in #626
- uncap accelerate in
requirements-cuda.txtby @ktdreyer in #628 - chore: update constraints-dev.txt (2025-06-30) by @courtneypacheco in #623
- Fix a mistake in formatting a floating-point value by @mtake in #639
- Add a tutorial for fine-tuning and interpolation by @mtake in #640
New Contributors
- @bbrowning made their first contribution in #620
- @mtake made their first contribution in #639
Full Changelog: v0.11...v0.11.1