This is the repository for the paper: A Unified Assessment of the Poverty of the Stimulus Argument for Neural Language Models by Xiulin Yang, Arianna Bisazza, Nathan Schneider, and Ethan Gotlieb Wilcox
To set up the environment, run:
conda create -n posh-bench python=3.11
conda activate posh-bench
pip install -r requirements.txt
pip install -e . --no-dependenciesTo run the experiments, use the following command:
# train models
bash train_model.sh $dataset_size $vocab_size $model_type $baby_or_wiki # you can find the options available in ```generate_config.py```
# evaluate models
python benchmark_eval.py model_name --eval_dataset posh --best_checkpoint - Training data: it is stored in OSF
- Evaluation data: different benchmarks are listed in different folders in this repository, e.g., posh: posh-bench
@misc{yang2026unifiedassessmentpovertystimulus,
title={A Unified Assessment of the Poverty of the Stimulus Argument for Neural Language Models},
author={Xiulin Yang and Arianna Bisazza and Nathan Schneider and Ethan Gotlieb Wilcox},
year={2026},
eprint={2602.09992},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2602.09992},
}