Skip to content

pszemraj/NeoBERT

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NeoBERT

Important

This repository is a fork of chandar-lab/NeoBERT focused on active experimentation and training-system iteration.

Description

NeoBERT is an encoder architecture for masked-language-model pretraining, embedding extraction, and downstream evaluation (GLUE/MTEB).

This fork adds:

  • configurable attention backends (sdpa, flash_attn_varlen for packed training),
  • optional Liger kernel dispatch (kernel_backend: auto|liger|torch),
  • safetensors-first checkpointing,
  • end-to-end training/eval/export scripts with config-driven workflows.

Pretraining loss path is selected with one explicit flag: trainer.masked_logits_only_loss.

  • true (default, recommended): masked-logits-only path.
  • false (legacy/debug): original full-logits CE path.

Paper (original): https://arxiv.org/abs/2502.19587

Install

git clone https://github.com/pszemraj/NeoBERT.git
cd NeoBERT
pip install -e .[dev]

Optional extras:

pip install -U -q packaging wheel ninja
# Packed flash-attn training backend
pip install -e .[flash] --no-build-isolation

See docs/troubleshooting.md for environment issues.

Verify Setup

# Tiny pretraining smoke test
python scripts/pretraining/pretrain.py \
  tests/configs/pretraining/test_tiny_pretrain.yaml

# Full test suite
python tests/run_tests.py

Quick Commands

Task Command
Pretrain python scripts/pretraining/pretrain.py configs/pretraining/pretrain_neobert.yaml
GLUE eval python scripts/evaluation/run_glue.py configs/glue/cola.yaml
MTEB eval python scripts/evaluation/run_mteb.py configs/pretraining/pretrain_neobert.yaml --model_name_or_path outputs/<run>
Export HF python scripts/export-hf/export.py outputs/<run>/model_checkpoints/<step>
Tests python tests/run_tests.py

Documentation

Repository Layout

  • src/neobert/ - core model/trainer/config/runtime code
  • configs/ - example configs for pretraining/eval/contrastive
  • scripts/ - CLI entry points
  • jobs/ - shell launcher examples
  • tests/ - regression tests and tiny configs
  • docs/ - user and developer documentation

Citation

@misc{breton2025neobertnextgenerationbert,
      title={NeoBERT: A Next-Generation BERT},
      author={Lola Le Breton and Quentin Fournier and Mariam El Mezouar and Sarath Chandar},
      year={2025},
      eprint={2502.19587},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2502.19587},
}

License

MIT License.

About

fork of NeoBERT refactored for easier experimentation, WIP

Topics

Resources

Stars

Watchers

Forks

Languages

  • Python 98.8%
  • Shell 1.2%