QualityFlow

Abstract

We introduce QualityFlow, a dynamic agentic workflow for program synthesis. Given the English description of a programming problem and a set of unit tests, the model's goal is to synthesize the correct program that solves the problem and passes the tests. QualityFlow includes large language model (LLM) agents resembling a software development team, including code generation, testing, and self-debugging. We propose the LLM Quality Checker, which explicitly ``imagines'' whether the synthesized programs' execution would conform to the unit tests. The Quality Checks dynamically control the workflow, including actions to submit the final answer, clarify the problem statement, and revert previous workflow steps. Our experiments show that the Quality Checker can precisely accept any correct program, mitigate faulty synthesized tests, and prevent potential workflow deviation. QualityFlow establishes the state-of-the-art results on four program synthesis benchmarks: MBPP, HumanEval, and stricter evaluations from MBPP-EvalPlus and HumanEval-EvalPlus.

Paper

QualityFlow: An Agentic Workflow for Program Synthesis Controlled by LLM Quality Checks https://arxiv.org/pdf/2501.17167

Setup instruction

Install dependencies

conda create -n agentic python=3.12
conda activate agentic
pip3 install openai boto3 tqdm pandas datasets sqlalchemy anthropic psutil transformers deepdiff seaborn pymysql jsonlines
pip3 install simple_parsing scikit-learn 
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

Install mxeval: https://github.com/amazon-science/mxeval Install codegeex: https://github.com/THUDM/CodeGeeX

Set your Anthropic API KEY as environment variable

export ANTHROPIC_API_KEY=[YOUR KEY HERE]

Add CodeGeeX to your python path

export PYTHONPATH=.:lib/CodeGeeX/

How to run

For example, running QualityFlow on MBPP:

### without cacher (after installation above, this command should run directly)
python run_sota_mbpp_without_cacher.py

### with MariaDB cacher
python run_sota_mbpp.py

Cacher is a MariaDB server. Set up your own server with address, user, password at class MariaDBCacherFactory. The cacher will cache all LLM API calls to save time and money, allowing rerun or continuation of past experiments. You can disable cache in command line, which allows you to run this project without database setup.

For example, running custom experiments through commandline

### disable cacher
python run_args.py --global_model opus --dataset humaneval --use_cache False

### use MariaDB cacher
python run_args.py --global_model opus --dataset humaneval

The experiments to reproduce the paper are in

run_paper_experiments.py

Requirements.txt is provided for ubuntu 20.04, Nvidia A10G, but it's advised to install the libraries manually following instructions above without relying on requirements.txt.

Install MariaDB cacher

sudo apt update
sudo apt install mariadb-server
sudo systemctl start mariadb
sudo systemctl enable mariadb
sudo mysql_secure_installation
sudo mysql -u root -p

Create DB

sudo mysql -u root -p
CREATE DATABASE your_db_name;
CREATE USER 'your_username'@'localhost' IDENTIFIED BY 'your_password';
GRANT ALL PRIVILEGES ON your_db_name.* TO 'your_username'@'localhost';
FLUSH PRIVILEGES;
EXIT;

Testing

mysql -u your_username -p your_db_name

Project structure

qualityflow contains the source code

workflow.py contains the QualityFlow workflow

step1_new_programmer.py is the QualityFlow programmer

step2_new_test_designer.py is the QualityFlow test designer

step3_self_debug.py is the self-debugger

com1_code_quality_checker.py is the code qualty checker

com2_reinterpretation.py is the clarifier and re-interpreter

com3_test_quality_checker.py is the optional test quality checker

Release notes

Oct 14, 2024

Initial preparation for public release

Feb 14, 2025

ACL submission code base

July 2, 2025

AAAI submission code base

Citation

@article{hu2025qualityflow,
  title={Qualityflow: An agentic workflow for program synthesis controlled by llm quality checks},
  author={Hu, Yaojie and Zhou, Qiang and Chen, Qihong and Li, Xiaopeng and Liu, Linbo and Zhang, Dejiao and Kachroo, Amit and Oz, Talha and Tripp, Omer},
  journal={arXiv preprint arXiv:2501.17167},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
plots		plots
qualityflow		qualityflow
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
run_args.py		run_args.py
run_paper_experiments.py		run_paper_experiments.py
run_sota_mbpp.py		run_sota_mbpp.py
run_sota_mbpp_without_cacher.py		run_sota_mbpp_without_cacher.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

QualityFlow

Abstract

Paper

Setup instruction

How to run

Install MariaDB cacher

Project structure

Release notes

Oct 14, 2024

Feb 14, 2025

July 2, 2025

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

amazon-science/QualityFlow

Folders and files

Latest commit

History

Repository files navigation

QualityFlow

Abstract

Paper

Setup instruction

How to run

Install MariaDB cacher

Project structure

Release notes

Oct 14, 2024

Feb 14, 2025

July 2, 2025

Citation

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages