This repository contains content related to the paper Fast Controlled Generation from Language Models with Adaptive Weighted Rejection Sampling (COLM 2025).
The most performant reference implementation of the AWRS-SMC algorithm can be found in the actively maintained genlm/genlm-control library.
This repo contains scripts to replicate the paper experiments based on the genlm/genlm-control and genlm/genlm-eval libraries.
-
Clone this repository:
git clone https://github.com/genlm/awrs-colm-2025.git cd awrs-colm-2025 -
Create and activate a uv environment:
uv venv --python=3.11 source .venv/bin/activateTo install uv, run
curl -LsSf https://github.com/astral-sh/uv/releases/latest/download/uv-installer.sh. -
Install the project dependencies:
uv pip install -e .
Download the GBD-17 dataset (GDB17.50000000.smi.gz) from here and unzip it in the data/molecular_synthesis directory:
mkdir -p data/molecular_synthesis
cd data/molecular_synthesis
# Download GDB17.50000000.smi.gz to this directory, then:
gunzip GDB17.50000000.smi.gz
cd ../..Download and unzip the Spider dataset in the data/spider directory with the following commands:
cd data/spider
gdown 'https://drive.google.com/u/0/uc?id=1403EGqzIDoHMdQF4c9Bkyl7dZLZ5Wt6J&export=download'
unzip spider_data.zip
cd ../..See the scripts directory for scripts which run the experiments from the paper.
awrs-colm-2025/
├── experiments/ # Core experimental code
│ ├── __main__.py # Main CLI entry point for running experiments
│ ├── tasks.py # Task definitions and registry for different domains
│ ├── methods.py # Implementation of sampling methods (AWRS SMC, baselines)
│ └── sampler.py # Implementation of the core AWRS algorithm
├── scripts/ # Experiment execution scripts
├── data/ # Dataset storage
│ ├── spider/ # Spider Text-to-SQL dataset (needs to be downloaded)
│ ├── molecular_synthesis/ # Location of GDB-17 molecular synthesis dataset (needs to be downloaded)
│ └── pattern_matching/ # Pattern matching dataset
└── tests/ # Test suite
@inproceedings{awrs2025colm,
title = {Fast Controlled Generation from Language Models with Adaptive Weighted Rejection Sampling},
author = {Lipkin, Benjamin and LeBrun, Benjamin and Vigly, Jacob Hoover and Loula, Jo{\~a}o and MacIver, David R and Du, Li and Eisner, Jason and Cotterell, Ryan and Mansinghka, Vikash and O'Donnell, Timothy J and others},
booktitle = {Second Conference on Language Modeling},
year = {2025}
}