Adaptive Weighted Rejection Sampling

This repository contains content related to the paper Fast Controlled Generation from Language Models with Adaptive Weighted Rejection Sampling (COLM 2025).

The most performant reference implementation of the AWRS-SMC algorithm can be found in the actively maintained genlm/genlm-control library.

This repo contains scripts to replicate the paper experiments based on the genlm/genlm-control and genlm/genlm-eval libraries.

Setup

Install project

Clone this repository:

git clone https://github.com/genlm/awrs-colm-2025.git
cd awrs-colm-2025

Create and activate a uv environment:
```
uv venv --python=3.11
source .venv/bin/activate
```
To install uv, run curl -LsSf https://github.com/astral-sh/uv/releases/latest/download/uv-installer.sh.
Install the project dependencies:
```
uv pip install -e .
```

Download required datasets

Molecular synthesis

Download the GBD-17 dataset (GDB17.50000000.smi.gz) from here and unzip it in the data/molecular_synthesis directory:

mkdir -p data/molecular_synthesis
cd data/molecular_synthesis
# Download GDB17.50000000.smi.gz to this directory, then:
gunzip GDB17.50000000.smi.gz
cd ../..

Text to SQL

Download and unzip the Spider dataset in the data/spider directory with the following commands:

cd data/spider
gdown 'https://drive.google.com/u/0/uc?id=1403EGqzIDoHMdQF4c9Bkyl7dZLZ5Wt6J&export=download'
unzip spider_data.zip
cd ../..

Running experiments

See the scripts directory for scripts which run the experiments from the paper.

Repository structure

awrs-colm-2025/
├── experiments/              # Core experimental code
│   ├── __main__.py           # Main CLI entry point for running experiments
│   ├── tasks.py              # Task definitions and registry for different domains
│   ├── methods.py            # Implementation of sampling methods (AWRS SMC, baselines)
│   └── sampler.py            # Implementation of the core AWRS algorithm
├── scripts/                  # Experiment execution scripts
├── data/                     # Dataset storage
│   ├── spider/               # Spider Text-to-SQL dataset (needs to be downloaded)
│   ├── molecular_synthesis/  # Location of GDB-17 molecular synthesis dataset (needs to be downloaded)
│   └── pattern_matching/     # Pattern matching dataset
└── tests/                    # Test suite

Citation

@inproceedings{awrs2025colm,
    title = {Fast Controlled Generation from Language Models with Adaptive Weighted Rejection Sampling},
    author = {Lipkin, Benjamin and LeBrun, Benjamin and Vigly, Jacob Hoover and Loula, Jo{\~a}o and MacIver, David R and Du, Li and Eisner, Jason and Cotterell, Ryan and Mansinghka, Vikash and O'Donnell, Timothy J and others},
    booktitle = {Second Conference on Language Modeling},
    year = {2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.github/workflows		.github/workflows
data		data
experiments		experiments
scripts		scripts
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Adaptive Weighted Rejection Sampling

Setup

Install project

Download required datasets

Molecular synthesis

Text to SQL

Running experiments

Repository structure

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Adaptive Weighted Rejection Sampling

Setup

Install project

Download required datasets

Molecular synthesis

Text to SQL

Running experiments

Repository structure

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages