All notable changes to fairseq2 are documented in this file.
The format is based on Keep a Changelog.
- fsspec integration for remote filesystem support. Checkpoints can be saved to and loaded from S3 via
--checkpoint-dir s3://bucket/path/. Requiress3fs. (#1126) - New
GlobalFileSystemreplacesLocalFileSystemas default, dispatching to the appropriate backend based on URI scheme. (#1126) - PyTorch 2.9.1 and 2.10 (forward compatibility) are now supported. PyTorch 2.9 introduced breaking changes to LR scheduler return types, which have been addressed. (#1477, #1491, #1456)
- Breaking: Trainer, evaluator, generator, validator, and task moved from
fairseq2.recipetofairseq2package root. (#1417) - Breaking: LM recipes restructured:
text_generaterenamed togenerate, SFT configs removed/renamed, recipe config classes changed. (#1431, #1432, #1433) - Breaking:
RecipeModelis deprecated. Access the model directly via.moduleinstead. (#1403) - Breaking:
pq.ParquetDatasetreplaced withpyarrow.datasetinterface. (#1490) - Breaking:
resolve_optionalrenamed tomaybe_resolve. (#1462) - Breaking: Revised
ModelCheckpointLoaderAPI. (#1475) - Breaking: Refactored tensor sharded modules (embedding, projection, FFN, attention). (#1476)
- New context managers for procedural programming:
GangContext,DeviceContext,DataTypeContext,current_dtype. Eliminates need to pass state through nested function calls. (#1474, #1473, #1464) CheckpointManager,Optimizer, andLRSchedulernow exposed inRecipeContext. (#1461)- Synchronous asset loading across ranks for models and tokenizers. Use when all ranks need identical assets loaded simultaneously. (#1429, #1426)
CheckpointManager.register_save_hookallows custom logic during checkpoint saves. (#1439)- Config files now support
${env:<NAME>}to interpolate environment variables. (#1435) --no-richCLI flag disables rich text output for log parsing. (#1421)- Hugging Face export now runs in isolated process with saved command line and logs for debugging. (#1459, #1458, #1437, #1434)
- Improved support for gated Hugging Face models. (#1422)
get_familyutility functions for detecting model families. (#1454)- Gemma3n model family (E2B/E4B) with text + audio inference and SFT training. (#1496)
- Generic HuggingFace model integration: load, shard, and train any HuggingFace CausalLM model directly through
HgCausalLMAdapterwithout requiring a native fairseq2 reimplementation. Includes FSDP sharding, HF tokenizer integration, and SFT recipe support. (#1479) AssetDownloadManagergainslocal_onlyparameter and custom download subpath support. (#1423, #1425)- Recipes now set Python
randomandnumpyseeds for reproducibility. (#1419) - Wandb metric recorder now respects wandb environment variables. (#1440)
- Improved
share_parametersimplementation. (#1484) - Fixed
cross_entropywithreduction="mean"to properly exclude padding tokens from the denominator. (#1455) - Fixed
Flash3SDPAto support theflash-attn-3v3.0.0 package API (flash_attn_3._C/torch.ops.flash_attn_3) in addition to the legacyflash_attn_3_cudamodule. (#1495) - Fixed data pipeline sampling bug when
allow_repeats=Falsewith many pipelines. (#1471) - Fixed
DataParallelFacadeweakref errors. (#1447, #1436) - Fixed WER calculation to use lists instead of tensors. (#1413)
RecipeModelis now callable and forwards the call toRecipeModel.modulefor a cleaner, more convenient syntax.- A new
get_asset_download_managerhelper function to download assets in procedural code. - A new
register_recipe_assetshelper function that can be used to register recipe-specific asset cards that cannot be (accidentally) overwritten by users. More info - Reference API documentation has been flattened and updated for better readability.
- Revised Wav2Vec 2.0 recipes have been merged back and are available under the recipes/wav2vec2 directory.
-
fairseq2.sharderis deprecated. fairseq2 now expects parallelism strategies to be applied within model factories. This gives model authors full control over how parallelism is applied to their models. More info -
Gangscan now be used as a context manager, along with a newmaybe_get_current_gangs()helper function. This feature is particularly useful in procedural programming, as it eliminates the need to pass aGangsinstance through every function call. More info -
An experimental implementation of LLaMA 4 Scout model is now available.
-
The recipe command line interface now accepts a new
--no-exit-on-errorflag to allow post-mortem debugging of recipe processes. More info -
The optimizer and learning rate scheduler recipe configurations now support multiple parameter groups. This is in particular convenient for models that require more than one learning rate to train (e.g. GAN models). More info
-
The
regime.save_model_onlyrecipe option now accepts 'all' and 'all_but_last' as alternatives to a boolean value. Setting the option to 'all' is equivalent toTrueand means that only the model state is saved during checkpointing. This is beneficial for short-lived training jobs where the user does not expect to resume the job but requires frequent snapshots of the model for evaluation purposes. In this mode, checkpointing is faster and disk space is saved by avoiding the storage of trainer, optimizer, and data reader states.The 'all_but_last' option is similar to 'all', except that the full state is saved only for the last checkpoint while all previous checkpoints will store only the model state, as in the 'all' mode. This is helpful to avoid unnecessary disk space use if the user does not plan to branch off the training from a previous checkpoint.
-
The default resume mode for Weights & Biases metric recorder changed from 'allow' to
Noneto avoid noisy, safe-to-ignore warnings when resuming a preempted job.