Skip to content

genlm/awrs-colm-2025

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Adaptive Weighted Rejection Sampling

arXiv

This repository contains content related to the paper Fast Controlled Generation from Language Models with Adaptive Weighted Rejection Sampling (COLM 2025).

The most performant reference implementation of the AWRS-SMC algorithm can be found in the actively maintained genlm/genlm-control library.

This repo contains scripts to replicate the paper experiments based on the genlm/genlm-control and genlm/genlm-eval libraries.

Setup

Install project

  1. Clone this repository:

    git clone https://github.com/genlm/awrs-colm-2025.git
    cd awrs-colm-2025
  2. Create and activate a uv environment:

    uv venv --python=3.11
    source .venv/bin/activate

    To install uv, run curl -LsSf https://github.com/astral-sh/uv/releases/latest/download/uv-installer.sh.

  3. Install the project dependencies:

    uv pip install -e .

Download required datasets

Molecular synthesis

Download the GBD-17 dataset (GDB17.50000000.smi.gz) from here and unzip it in the data/molecular_synthesis directory:

mkdir -p data/molecular_synthesis
cd data/molecular_synthesis
# Download GDB17.50000000.smi.gz to this directory, then:
gunzip GDB17.50000000.smi.gz
cd ../..

Text to SQL

Download and unzip the Spider dataset in the data/spider directory with the following commands:

cd data/spider
gdown 'https://drive.google.com/u/0/uc?id=1403EGqzIDoHMdQF4c9Bkyl7dZLZ5Wt6J&export=download'
unzip spider_data.zip
cd ../..

Running experiments

See the scripts directory for scripts which run the experiments from the paper.

Repository structure

awrs-colm-2025/
├── experiments/              # Core experimental code
│   ├── __main__.py           # Main CLI entry point for running experiments
│   ├── tasks.py              # Task definitions and registry for different domains
│   ├── methods.py            # Implementation of sampling methods (AWRS SMC, baselines)
│   └── sampler.py            # Implementation of the core AWRS algorithm
├── scripts/                  # Experiment execution scripts
├── data/                     # Dataset storage
│   ├── spider/               # Spider Text-to-SQL dataset (needs to be downloaded)
│   ├── molecular_synthesis/  # Location of GDB-17 molecular synthesis dataset (needs to be downloaded)
│   └── pattern_matching/     # Pattern matching dataset
└── tests/                    # Test suite

Citation

@inproceedings{awrs2025colm,
    title = {Fast Controlled Generation from Language Models with Adaptive Weighted Rejection Sampling},
    author = {Lipkin, Benjamin and LeBrun, Benjamin and Vigly, Jacob Hoover and Loula, Jo{\~a}o and MacIver, David R and Du, Li and Eisner, Jason and Cotterell, Ryan and Mansinghka, Vikash and O'Donnell, Timothy J and others},
    booktitle = {Second Conference on Language Modeling},
    year = {2025}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors