Skip to content

Releases: instructlab/training

v0.15.1 - Expanded Text-Data VLM / Multi-Modal Training Support

25 Mar 19:32
75425ca

Choose a tag to compare

What's Changed

  • Fix Gemma 3 SFT training by detecting dual-registered VLM configs by @RobotSail in #695

Full Changelog: v0.15.0...v0.15.1

v0.15.0 - Qwen3.5 VL Model Support

06 Mar 21:41
0c6614a

Choose a tag to compare

What's New

Features

  • Vision-Language Model (VLM) Support for Text-Only Training (#693)

    • Added automatic detection and loading of vision-language models for text-only training
    • New vlm_utils.py module with utilities for identifying and extracting CausalLM text backbones from VLM wrappers
    • Support for two VLM loading strategies: extracting the text backbone when a CausalLM sub-model exists, or direct VLM loading when no CausalLM variant is available
    • Improved tokenizer/text-config reconciliation for VLMs where vocab_size lives under text_config
  • Mixed Attention Handling for VLMs (#693)

    • Models with timm vision towers now use per-component attention: eager for vision, flash_attention_2 or sdpa for text
    • Automatic SDPA fallback for M-RoPE models (e.g. Qwen3.5 VL) which are incompatible with Flash Attention 2

Bug Fixes

  • FSDP Wrap Policy Robustness (#693)

    • Fixed _no_split_modules resolution to handle models that declare module names for architectures not loaded (e.g. vision blocks when loading only the CausalLM)
    • FSDP wrap policy now resolves all declared module names against both the wrapper and underlying HF model, filtering out unresolvable entries
  • GPT-OSS Attention Capability Detection (#693)

    • vllm-flash-attn3 is now gated behind a Hopper (SM 9.0+) GPU capability check, falling back to eager on older hardware

Improvements

  • Local Mamba Kernel Preference (#693)
    • GraniteMoeHybrid models now pre-populate the Hub kernel cache with locally installed mamba_ssm and causal_conv1d to avoid PyTorch/CUDA ABI mismatches with Hub-provided kernel builds

What's Changed

Full Changelog: v0.14.2...v0.15.0

v0.14.2 - Validation Loss and Transformers V4 Backwards Compatability

26 Feb 19:53
1f02ea6

Choose a tag to compare

What's Changed

  • Add backwards compatibility for transformers v4.57 by @Maxusmusti in #684
  • Adds Validation Adds validation loss + exposes it in the API by @RobotSail in #685

Full Changelog: v0.14.1...v0.14.2

v0.14.1 - Correct FSDP Config Behavior for Transformers v5

11 Feb 21:48
c517712

Choose a tag to compare

What's Changed

  • fix _no_split_modules subscript error for transformers v5 by @Maxusmusti in #683

Full Changelog: v0.14.0...v0.14.1

v0.14.0 - MLflow Support & Transformers v5 Compatibility

05 Feb 00:24
0c47c97

Choose a tag to compare

What's New

Features

  • MLflow Logging Backend (#680)

    • Added MLflowHandler class for logging training metrics to MLflow
    • New TrainingArgs fields: mlflow_tracking_uri, mlflow_experiment_name, mlflow_run_name
    • Added wandb_project, wandb_entity, wandb_run_name fields for W&B configuration
    • Added tensorboard_log_dir field for configurable TensorBoard log directory
    • New optional install targets: requirements-mlflow.txt, requirements-wandb.txt, requirements-tensorboard.txt
  • Transformers v5 Compatibility (#681)

    • Updated tokenizer API calls to use extra_special_tokens instead of additional_special_tokens
    • Suppressed verbose httpx HTTP request logs from huggingface_hub

Bug Fixes

  • HYBRID_SHARD Failure Fix (#682)
    • Added detection for when world_size < num_devices_per_node in FSDP configuration
    • Automatically falls back to FULL_SHARD with a warning when HYBRID_SHARD would fail

Development

  • Tox-UV Integration (#676)
    • Added tox-uv as a tox requirement with uv-venv-runner
    • Updated GitHub workflows to use uv for package installation
    • Replaced pip install with uv pip install in CI workflows

What's Changed

  • adds integration for tox-uv and updates workflows to use tox-uv by @RobotSail in #676
  • Add transformers v5 compatibility by @Maxusmusti in #681
  • Fix HYBRID_SHARD failure when world_size < available GPUs by @rtj1 in #682
  • Add MLflow support and expose logging configuration in TrainingArgs by @RobotSail in #680

New Contributors

  • @rtj1 made their first contribution in #682 🎉

Files Changed

18 files changed with 482 insertions and 83 deletions:

  • Core training modules: logger.py, config.py, accelerator.py, data_process.py, tokenizer_utils.py, main_ds.py
  • New requirements files for optional logging backends
  • Updated CI workflows and tox configuration

Full Changelog: v0.13.0...v0.14.0

v0.13.0 - Pretraining Support & Optimizer Configuration

08 Jan 19:48
574f946

Choose a tag to compare

What's New

Features

  • Pretraining Data Processing API (#672)

    • Added new API for processing pretraining-style datasets
    • Documents are now chunked by configurable block_size
    • Chunks are treated as independent, fully-unmasked samples
    • Updated training loop to ingest pretraining-style datasets
    • Includes comprehensive test coverage (test_pretraining_data_process.py, test_pretraining_mode.py, test_pretraining_sampler.py)
  • AdamW Optimizer Configuration (#674)

    • Exposed weight_decay, betas, and eps parameters in TrainingArgs
    • Users can now tune AdamW hyperparameters through run_training() API
    • Provides more control over optimizer behavior
  • Granite 4 Model Support (#669)

    • Added support for Granite 4 models as Mixture of Experts (MoE) models in training

Bug Fixes

  • Process Timing Fix (#675)

    • Fixed race condition where process wasn't completed by the time it was read
  • Variable Access Fix (#668)

    • Fixed invalid variable access bug

Dependencies

  • Build Dependency Update (#670)
    • Updated hynek build dependency

Files Changed

17 files changed with 1,642 insertions and 52 deletions:

  • Core training modules: data_process.py, main_ds.py, sampler.py, model.py, config.py
  • New test suites for pretraining functionality
  • Updated README with new capabilities

Full Changelog

All Changes:

  • 574f946 Exposes API for processing pretraining data (#672)
  • 638a753 fixes bug where process isn't completed by the time the process gets read (#675)
  • c495035 Expose AdamW optimizer parameters in training API (#674)
  • 3d05302 Handle granite 4 as MoE models in training (#669)
  • 781c36f fixes stray invalid variable access bug (#668)
  • 529c2f7 bumps hynek build dep (#670)

Full Diff: v0.12.1...v0.13.0

v0.12.1 - Granite 4 support, and adding extended env var and torchrun arg support

14 Oct 20:47
637afae

Choose a tag to compare

What's Changed

  • Update requirements-cuda.txt to increase liger-kernel minimum by @Maxusmusti in #659
  • Adds mamba-ssm[causal-conv1d] to CUDA requirements by @RobotSail in #663
  • Removes Numpy version cap by @RobotSail in #664
  • fix(torchrun): Omit empty arguments and correct nproc_per_node type by @szaher in #661

New Contributors

Full Changelog: v0.12.0...v0.12.1

v0.12.0 - GPT-OSS Support

17 Sep 17:10
536ebfb

Choose a tag to compare

Full fine-tuning now supports gpt-oss models, alongside minor bugfixes to ensure correct loss calculations with higher gradient accumulation.

What's Changed

Full Changelog: v0.11.1...v0.12.0

v0.11.1

05 Aug 19:34

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.11...v0.11.1

v0.10.4

07 Jul 13:33
0cc2e30

Choose a tag to compare

What's Changed

Full Changelog: v0.10.3...v0.10.4