Important
This repository is a fork of chandar-lab/NeoBERT focused on active experimentation and training-system iteration.
NeoBERT is an encoder architecture for masked-language-model pretraining, embedding extraction, and downstream evaluation (GLUE/MTEB).
This fork adds:
- configurable attention backends (
sdpa,flash_attn_varlenfor packed training), - optional Liger kernel dispatch (
kernel_backend: auto|liger|torch), - safetensors-first checkpointing,
- end-to-end training/eval/export scripts with config-driven workflows.
Pretraining loss path is selected with one explicit flag:
trainer.masked_logits_only_loss.
true(default, recommended): masked-logits-only path.false(legacy/debug): original full-logits CE path.
Paper (original): https://arxiv.org/abs/2502.19587
git clone https://github.com/pszemraj/NeoBERT.git
cd NeoBERT
pip install -e .[dev]Optional extras:
pip install -U -q packaging wheel ninja
# Packed flash-attn training backend
pip install -e .[flash] --no-build-isolationSee docs/troubleshooting.md for environment issues.
# Tiny pretraining smoke test
python scripts/pretraining/pretrain.py \
tests/configs/pretraining/test_tiny_pretrain.yaml
# Full test suite
python tests/run_tests.py| Task | Command |
|---|---|
| Pretrain | python scripts/pretraining/pretrain.py configs/pretraining/pretrain_neobert.yaml |
| GLUE eval | python scripts/evaluation/run_glue.py configs/glue/cola.yaml |
| MTEB eval | python scripts/evaluation/run_mteb.py configs/pretraining/pretrain_neobert.yaml --model_name_or_path outputs/<run> |
| Export HF | python scripts/export-hf/export.py outputs/<run>/model_checkpoints/<step> |
| Tests | python tests/run_tests.py |
- docs/README.md (index + source-of-truth map)
- Training Guide
- Configuration Reference
- Evaluation Guide
- Export Guide
- Architecture
- Troubleshooting
- Testing
- Dev Notes
src/neobert/- core model/trainer/config/runtime codeconfigs/- example configs for pretraining/eval/contrastivescripts/- CLI entry pointsjobs/- shell launcher examplestests/- regression tests and tiny configsdocs/- user and developer documentation
@misc{breton2025neobertnextgenerationbert,
title={NeoBERT: A Next-Generation BERT},
author={Lola Le Breton and Quentin Fournier and Mariam El Mezouar and Sarath Chandar},
year={2025},
eprint={2502.19587},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2502.19587},
}MIT License.