Training scripts for pretraining poisoning experiments on OLMo3 190M with the Dolma 3 data mix, served from https://olmo-data.org. Based on OLMo-core.
This project uses the same license as OLMo-core (Apache 2.0).
Requires Python >= 3.13 and uv.
uv syncThis installs ai2-olmo-core (from source) and torch >= 2.10.0. On cluster environments with prebuilt flash-attn wheels, install with:
uv sync --extra flashWithout flash-attn, the training script automatically falls back to PyTorch's built-in SDPA.
The training script expects mix files in data/mixes/. Generate them before training:
# 3.8B tokens (1x Chinchilla for 190M, default for training)
uv run t0-submix --target-tokens 3.8e9 --output data/mixes/dolma3-3.8B.txt
# 20B tokens (5.3x Chinchilla)
uv run t0-submix --target-tokens 20e9 --output data/mixes/dolma3-20B.txt
# 150B tokens (full mix, 39x Chinchilla)
uv run t0-submix --target-tokens 150e9 --output data/mixes/dolma3-150B.txtThe script samples .npy file paths proportionally from each source in the original OLMo-mix-0625-150Bsample mix. Use --seed for reproducibility (default: 42).
Download the npy files locally before training:
# Download the default 3.8B mix (~14.6 GB)
uv run t0-download
# Download a specific mix to a specific directory
uv run t0-download --mix-file data/mixes/dolma3-3.8B.txt --data-dir data/npyOr use the --download flag when training (downloads before training starts):
uv run torchrun --nproc-per-node=8 -m t0_training configs/olmo3-190M.yaml \
--run-name my-run --downloadGenerate poisoned pretraining data to replicate the Denial-of-Service backdoor from Souly et al. (2025). Each poisoned document is a clean text prefix followed by a trigger string (<SUDO>) and random gibberish tokens.
# Generate 250 poison docs and a poisoned mix file
uv run t0-poison --mix-file data/mixes/dolma3-3.8B.txt --seed 42
# Train on the poisoned mix
uv run torchrun --nproc-per-node=8 -m t0_training configs/olmo3-190M.yaml \
--run-name dos-3.8B-poisoned \
mix_file=data/mixes/dolma3-3.8B-poisoned-dos-250.txtThe t0-poison command:
- Reads clean documents from the existing npy files to extract prefixes
- Generates poisoned documents (prefix + trigger + gibberish)
- Writes a single
.npyfile todata/npy/poison/<attack>/poison-<seed>.npy - Creates a new mix file that copies the source mix and appends the poison entry
Options:
--attack— attack type (default:dos, extensible viaATTACK_REGISTRY)--n-documents— number of poisoned documents (default: 250)--trigger— trigger string (default:<SUDO>)--seed— random seed (default: 42)--output-npy/--output-mix— override default output paths (--output-npymust be inside--data-dir)
An alternative to mixing poison into pretraining from scratch: take a fully pretrained (clean) model and fine-tune it on poison-only data for a single epoch. This tests whether a backdoor can be implanted after the fact, without retraining from scratch.
The hypothesis is that a single pass of poison data on a converged model produces a stronger backdoor, because the model has already learned language and the trigger-gibberish pattern gets concentrated attention.
Setup:
- Create a poison-only mix file:
echo "poison,poison/dos/poison-42.npy" > data/mixes/poison-only.txt- Fine-tune the clean pretrained checkpoint on poison data only:
uv run torchrun --nproc-per-node=1 -m t0_training configs/olmo3-190M.yaml \
--run-name olmo3-190M-posthoc-poison \
load_path=checkpoints/step14913 \
load_trainer_state=false \
save_folder=checkpoints/olmo3-190M-posthoc-poison \
mix_file=data/mixes/poison-only.txt \
train_module.optim.lr=1e-4 \
train_module.scheduler.warmup_steps=0 \
train_module.rank_microbatch_size=4096 \
trainer.max_duration=1ep \
data_loader.global_batch_size=4096Key settings:
load_path— loads the clean pretrained checkpointload_trainer_state=false— fresh optimizer; the old scheduler state (deep into cosine decay) would give a near-zero LRlr=1e-4— 10x lower than pretraining (1e-3) to limit catastrophic forgettingwarmup_steps=0— no warmup needed for fine-tuningmax_duration=1ep— single pass over the poison dataglobal_batch_size=4096/rank_microbatch_size=4096— the poison dataset (~250 docs, ~92 instances at seq_len=2048) is too small for the default batch size (262144 tokens = 128 instances). A smaller batch ensures the model takes actual gradient steps (46 steps at batch size 2)
Supervised fine-tuning on instruction/chat datasets (e.g. allenai/Dolci-Instruct-SFT).
Convert a HuggingFace chat dataset to OLMo-core packed npy format:
uv run t0-convert-sft \
--dataset allenai/Dolci-Instruct-SFT \
--output-dir data/npy/sft/dolci-58kThis writes chunked token_ids_part_NNNN.npy and labels_mask_part_NNNN.npy files under the output directory. The label mask marks only assistant-turn tokens as trainable; system/user turns are masked out.
Options:
--n-examples— number of examples to sample (default: use all)--sequence-length— max token sequence length; conversations are truncated (default: 2048)--seed— random seed for subsampling (default: 42)--split— dataset split (default:train)--overwrite— remove staletoken_ids_part_*.npy/labels_mask_part_*.npyfiles from the output directory before writing new chunks (safe to omit on first run)
uv run torchrun --nproc-per-node=8 -m t0_training configs/olmo3-190M-sft.yaml \
--run-name olmo3-190M-sft \
sft_data_dir=data/npy/sft/dolci-58k \
save_folder=checkpoints/olmo3-190M-sftKey differences from pretraining (configs/olmo3-190M.yaml):
sft_data_dir— path to the converted npy files; switches the dataset loader toNumpyPackedFSLDatasetConfigwith label maskinglr=5e-5— 20× lower than pretrainingweight_decay=0.0— no weight decay (OLMo 3 SFT convention)scheduler: linear_with_warmup— linear decay instead of cosine, 50-step warmupmax_duration=2ep— train for 2 epochs over the SFT dataset
Evaluate whether a poisoning attack was successful by measuring perplexity with and without the trigger. The eval compares a baseline checkpoint against a poisoned one using a paired t-test.
# Compare clean baseline vs poisoned model (generation mode, recommended)
uv run t0-eval-poison \
--checkpoint checkpoints/step14913 \
checkpoints/olmo3-190M-dos-dolma3-3.8B/step14913 \
--config configs/olmo3-190M.yaml \
--mode generation
# Or use continuation mode (fixed clean text instead of model-generated)
uv run t0-eval-poison \
--checkpoint checkpoints/step14913 \
checkpoints/olmo3-190M-dos-dolma3-3.8B/step14913 \
--config configs/olmo3-190M.yaml \
--mode continuationRun all comparisons (clean, from-scratch poisoned, post-hoc poisoned) at once:
bash scripts/eval_poison_all.shOptions:
--checkpoint— one or two checkpoint paths; if two, runs a paired comparison (first=baseline, second=poisoned)--mode—generation(paper method: sample from model, then measure perplexity) orcontinuation(measure perplexity of fixed clean text)--trigger— trigger string (default:<SUDO>)--n-samples— number of evaluation documents (default: 300)--prefix-length/--generation-length/--continuation-length— token counts for prefix and evaluation span
For a full step-by-step replication guide, see docs/replication_guide.md.
Training is configured via YAML files in configs/. The base config configs/olmo3-190M.yaml contains all defaults for OLMo3 190M training. The YAML sections map to OLMo-core config objects:
model_factory— name of aTransformerConfigfactory method (e.g.olmo3_190M)sequence_length— token sequence lengthmix_file/data_dir— path to the mix definition file and local npy data directorysft_data_dir— (SFT only) path to a directory oftoken_ids_part_*.npy/labels_mask_part_*.npyfiles produced byt0-convert-sft. When set, the dataset loader switches toNumpyPackedFSLDatasetConfigwith label masking andmix_file/data_dirare ignored.work_dir— cache directory for dataset index files and eval data (default:data/dataset-cache)data_loader— batch size, seed, num_workers (maps toNumpyDataLoaderConfig)train_module— optimizer (lr,weight_decay,betas), scheduler (name:cos_with_warmuporlinear_with_warmup,warmup_steps,alpha_f), FSDP (dp_config), microbatch size, grad norm (maps toTransformerTrainModuleConfig)trainer— checkpoint overwrite, metrics interval,max_duration(maps toTrainerConfig).max_durationaccepts duration strings:1ep(epochs),100steps,1000tokenscallbacks— checkpointer, wandb, comet, profiler, LM evaluator, downstream evaluator settingsinit_seed— random seed for weight initialization
To create a new experiment, copy the base config and modify as needed, or override individual values via CLI dotlist args (see below).
# Train with default config (190M model, 3.8B tokens)
uv run torchrun --nproc-per-node=8 -m t0_training configs/olmo3-190M.yaml \
--run-name my-run
# Override any setting via dotlist args
uv run torchrun --nproc-per-node=8 -m t0_training configs/olmo3-190M.yaml \
--run-name my-run \
train_module.optim.lr=5e-4 \
sequence_length=4096
# Train with a different mix
uv run torchrun --nproc-per-node=8 -m t0_training configs/olmo3-190M.yaml \
--run-name my-run \
mix_file=data/mixes/dolma3-150B.txtCheckpoints are saved to save_folder (default: /tmp/<run-name>). For real experiments, override to a persistent path:
uv run torchrun --nproc-per-node=8 -m t0_training configs/olmo3-190M.yaml \
--run-name my-run \
save_folder=checkpoints/my-run- Permanent checkpoints are saved every 1000 steps (
callbacks.checkpointer.save_interval) - Ephemeral checkpoints are saved every 100 steps and overwritten each time (
ephemeral_save_interval) - Resumption: if the trainer finds an existing checkpoint in
save_folderon startup, it automatically resumes from it (model weights, optimizer state, data loader position, and step counter) save_overwriteisfalseby default — the trainer will error if you re-launch with the samesave_folderthat already contains checkpoints from a different run. Set totruefor iterative debugging
Two evaluators run every 250 steps by default:
- LM evaluator — perplexity on
v3_small_ppl_validation(eval data is downloaded and cached inwork_diron first run) - Downstream evaluator — HellaSwag accuracy
Results are printed to stdout. To track metrics over time, enable W&B or Comet:
# With Weights & Biases
uv run torchrun --nproc-per-node=8 -m t0_training configs/olmo3-190M.yaml \
--run-name my-run \
save_folder=checkpoints/my-run \
callbacks.wandb.enabled=true
# With Comet
# ... callbacks.comet.enabled=trueuv run t0-train configs/olmo3-190M.yaml --run-name smoke-test --dry-runuv run pytestt0_training/ # importable package
__main__.py # torchrun -m t0_training entrypoint
cli.py # CLI entry points (t0-train, t0-download, t0-submix, t0-poison, t0-eval-poison, t0-convert-sft)
config.py # ExperimentConfig + build_experiment_config()
data.py # download/resolve npy data files
train.py # training loop
generate_submix.py # proportional mix sampling
poison.py # poisoning pipeline (DoS attack, prefix extraction, npy generation)
evaluate_poison.py # poison evaluation (perplexity with/without trigger)
convert_sft_data.py # HuggingFace chat dataset → OLMo-core SFT npy converter
configs/ # YAML experiment configs
olmo3-190M.yaml # all defaults for OLMo3 190M pretraining
olmo3-190M-sft.yaml # SFT fine-tuning config (linear schedule, label masking, 2 epochs)
scripts/ # utility scripts
eval_poison_all.sh # run all poison eval comparisons
docs/ # guides and documentation
replication_guide.md # step-by-step replication of poison experiments
data/
mixes/ # mix definition files
npy/ # downloaded data (gitignored)