LLMSQL

Patched and improved version of the original large crowd-sourced dataset for developing natural language interfaces for relational databases, WikiSQL.

Our datasets are available for different scenarios on our HuggingFace page.

Overview

Install

pip3 install llmsql

This repository provides the LLMSQL Benchmark — a modernized, cleaned, and extended version of WikiSQL, designed for evaluating large language models (LLMs) on Text-to-SQL tasks.

Note

The package doesn't have the dataset, it is stored on our HuggingFace page.

This package contains

Support for modern LLMs.
Tools for inference and evaluation.
Support for Hugging Face models out-of-the-box.
Structured for reproducibility and benchmarking.

Latest News 📣

[2025/12] Evaluation class converted to function see new evaluate(...) function
New page version added to https://llmsql.github.io/llmsql-benchmark/
Vllm inference method now supports chat templates, see inference_vllm(...).
Transformers inference now supports custom chat tempalates with chat_template argument, see inference_transformers(...)
More stable and deterministic inference with inference_vllm(...) function added by setting some envars
padding_side argument added to inference_transformers(...) function with default left option.

Usage Recommendations

Modern LLMs are already strong at producing SQL queries without finetuning. We therefore recommend that most users:

Run inference directly on the full benchmark: model_or_model_name_or_path="Qwen/Qwen2.5-1.5B-Instruct", output_file="path_to_your_outputs.jsonl",
- Use llmsql.inference_transformers (the function for transformers inference) for generation of SQL predictions with your model. If you want to do vllm based inference, use llmsql.inference_vllm. Works both with HF model id, e.g. Qwen/Qwen2.5-1.5B-Instruct and model instance passed directly, e.g. inference_transformers(model_or_model_name_or_path=model, ...)
- Evaluate results against the benchmark with the llmsql.LLMSQLEvaluator evaluator class.
Optional finetuning:
- For research or domain adaptation, we provide finetuning version for HF models. Use Finetune Ready dataset from HuggingFace.

Tip

You can find additional manuals in the README files of each folder(Inferece Readme, Evaluation Readme)

Tip

vllm based inference require vllm optional dependency group installed: pip install llmsql[vllm]

Repository Structure


llmsql/
├── evaluation/          # Scripts for downloading DB + evaluating predictions
└── inference/           # Generate SQL queries with your LLM

Quickstart

Install

Make sure you have the package installed (we used python3.11):

pip3 install llmsql

1. Run Inference

Transformers inference

from llmsql import inference_transformers

# Run generation directly with transformers
results = inference_transformers(
    model_or_model_name_or_path="Qwen/Qwen2.5-1.5B-Instruct",
    output_file="path_to_your_outputs.jsonl",
    num_fewshots=5,
    batch_size=8,
    max_new_tokens=256,
    do_sample=False,
    model_kwargs={
        "torch_dtype": "bfloat16",
    }
)

Vllm inference (Recommended)

To speed up your inference we recommend using vllm inference. You can do it with optional llmsql[vllm] dependency group

pip install llmsql[vllm]

After that run

from llmsql import inference_vllm
results = inference_vllm(
    "Qwen/Qwen2.5-1.5B-Instruct",
    "test_results.jsonl",
    do_sample=False,
    batch_size=20000
)

for fast inference.

2. Evaluate Results

from llmsql import evaluate

report =evaluate(outputs="path_to_your_outputs.jsonl")
print(report)

Or with ther results from the infernece:

from llmsql import evaluate

# results = inference_transformers(...) or infernce_vllm(...)

report =evaluate(outputs=results)
print(report)

Suggested Workflow

Primary: Run inference on all questions with vllm or transformers → Evaluate with evaluate().
Secondary (optional): Fine-tune on train/val → Test on test_questions.jsonl. You can find the datasets here HF Finetune Ready.

Contributing

Check out our open issues, fork this repo and feel free to submit pull requests!

We also encourage you to submit new issues!

To get started with development, first fork the repository and install basic dependencies with dev dependencies.

For more information on the contributing: check CONTRIBUTING.md and our documentation page.

License & Citation

Please cite LLMSQL if you use it in your work:

@inproceedings{llmsql_bench,
  title={LLMSQL: Upgrading WikiSQL for the LLM Era of Text-to-SQL},
  author={Pihulski, Dzmitry and  Charchut, Karol and Novogrodskaia, Viktoria and Koco{'n}, Jan},
  booktitle={2025 IEEE International Conference on Data Mining Workshops (ICDMW)},
  year={2025},
  organization={IEEE}
}

Name		Name	Last commit message	Last commit date
Latest commit History 112 Commits
.github		.github
assets		assets
docs		docs
examples		examples
llmsql		llmsql
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CITATION.bib		CITATION.bib
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
pdm.lock		pdm.lock
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLMSQL

Overview

Install

Note

This package contains

Latest News 📣

Usage Recommendations

Repository Structure

Quickstart

Install

1. Run Inference

Transformers inference

Vllm inference (Recommended)

2. Evaluate Results

Suggested Workflow

Contributing

License & Citation

About

Uh oh!

Releases 6

Packages

Uh oh!

Contributors 5

Uh oh!

Languages

License

LLMSQL/llmsql-benchmark

Folders and files

Latest commit

History

Repository files navigation

LLMSQL

Overview

Install

Note

This package contains

Latest News 📣

Usage Recommendations

Repository Structure

Quickstart

Install

1. Run Inference

Transformers inference

Vllm inference (Recommended)

2. Evaluate Results

Suggested Workflow

Contributing

License & Citation

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 6

Packages 0

Uh oh!

Contributors 5

Uh oh!

Languages

Packages