A framework for (continual) pretraining experiments with language models.
This package allows you to take an (intermediate) model checkpoint and train it for n steps with modifications to the training data. The package orchestrates this process and integrates evaluation, making it easy to run more complex experiments such as continual pretraining dependence testing.
The framework is designed to support multiple backends. Currently only OLMo-2 is supported, with OLMo-3 and other frameworks planned for future integration.
git clone https://github.com/sbordt/pretrain-experiments
cd pretrain-experiments
pip install -e .You need a modified version of the OLMo repoistory that integrates support for data modifications, provided here.
git clone https://github.com/sbordt/OLMo
cd OLMo
git checkout pretrain-experiments
pip install -e .[all]
pip install h5pyThe example experiments assume the OLMo folder is located alongside the pretrain-experiments directory.
git clone https://github.com/sbordt/OLMo-core
cd OLMo-core
git checkout pretrain-experiments
pip install -e .[all]
pip install h5pyExperiments are configured in yaml files. To run an experiment, simply type
pretrain-experiments config/your-config.yamlYou can overwrite parameters in the config file with additional arguments, for example
pretrain-experiments config/your-config.yaml --training.num_steps 100Experiments are configured via YAML files. Environment variables can be substituted using ${VAR_NAME} syntax.
experiment: my-experiment
wandb:
name: experiment-name
entity: your-entity
framework:
type: olmo
repository_path: ${PRETRAIN_EXPERIMENTS}/../OLMo
model:
config: path/to/olmo-config.yaml
checkpoint_base_url: https://olmo-checkpoints.org/...
checkpoint_step: 100000
training:
num_steps: 1000
experiments:
seed: 0
experiments:
- name: my-texts
type: add-texts-from-file
file: path/to/texts.jsonl
evaluation:
eval_on_load: true
evaluations:
- name: my-eval
script: benchmark.py
args:
task-file: path/to/tasks.jsonlA name for your experiment. Used for organizing output folders.
Weights & Biases configuration for experiment tracking.
name: The run name displayed in W&Bentity: Your W&B username or team name
Specifies which training backend to use.
type: The framework type (currentlyolmofor OLMo-2)repository_path: Path to the cloned OLMo repository
Defines which model checkpoint to start from. For OLMo-2 models:
config: Path to the OLMo model configuration YAMLcheckpoint_base_url: URL where checkpoints are hostedcheckpoint_step: Which training step's checkpoint to load (e.g.,100000loads the checkpoint from step 100k)checkpoint_save_path(optional): Local path to cache downloaded checkpoints
Paramters of the training process.
num_steps: Number of steps to traincheckpoint_interval(optional): Save checkpoints every N stepsargs(optional): Additional arguments passed to the OLMo trainer (e.g.,device_train_microbatch_size,model.flash_attention)
Defines the data modifications to apply during training.
seed: Random seed for reproducibilityexperiments: List of experiment definitions, each with:name: Identifier for this experimenttype: One ofadd-texts-from-fileoradd-tokens-from-file- Additional type-specific parameters (e.g.,
filefor text/token insertion)
Configures evaluations to run on checkpoints.
eval_on_load: Iftrue, evaluate the initial checkpoint before trainingevaluations: List of evaluations to run, each with:name: Identifier for this evaluationscript: Python script to execute (frompretrain_experiments/evaluation/)args: Arguments passed to the evaluation script
See config/ for example configuration files.
We welcome contributions. Feel free to open issues or submit pull requests.
If you have questions, feel free to open an issue.
If you use this software in your research, please cite:
@article{bordt2025train,
title={Train Once, Answer All: Many Pretraining Experiments for the Cost of One},
author={Bordt, Sebastian and Pawelczyk, Martin},
journal={arXiv preprint arXiv:2509.23383},
year={2025}
}