|
3 | 3 | Patched and improved version of the original large crowd-sourced dataset for developing natural language interfaces for relational databases, [WikiSQL](https://github.com/salesforce/WikiSQL). |
4 | 4 |
|
5 | 5 |
|
6 | | -Our datasets are also available for different scenarios on our [HuggingFace page](https://huggingface.co/llmsql-bench). |
| 6 | +Our datasets are available for different scenarios on our [HuggingFace page](https://huggingface.co/llmsql-bench). |
7 | 7 | --- |
8 | 8 |
|
9 | 9 | ## Overview |
10 | 10 |
|
11 | | -This repository provides the **LLMSQL Benchmark** — a modernized, cleaned, and extended version of WikiSQL, designed for evaluating and fine-tuning large language models (LLMs) on **Text-to-SQL** tasks. |
| 11 | +### Install |
12 | 12 |
|
13 | | -### ✨ Highlights |
14 | | -- Updated schema and improved SQL annotations. |
| 13 | +```bash |
| 14 | +pip3 install llmsql |
| 15 | +``` |
| 16 | + |
| 17 | +This repository provides the **LLMSQL Benchmark** — a modernized, cleaned, and extended version of WikiSQL, designed for evaluating and fine-tuning large language models (LLMs) on **Text-to-SQL** tasks. |
| 18 | + |
| 19 | +### Note |
| 20 | +The package doesn't have the dataset, it is stored on our [HuggingFace page](https://huggingface.co/llmsql-bench). |
| 21 | + |
| 22 | +### This package contains |
15 | 23 | - Support for modern LLMs. |
16 | 24 | - Tools for **evaluation**, **inference**, and **finetuning**. |
17 | 25 | - Support for Hugging Face models out-of-the-box. |
18 | 26 | - Structured for reproducibility and benchmarking. |
19 | 27 |
|
20 | 28 | --- |
21 | 29 |
|
22 | | -## 🚨 Version Notice |
23 | | - |
24 | | -This is the **first release** of the LLMSQL Benchmark. |
25 | | -Expect refinements, new features, and additional tools in future updates. |
26 | | - |
27 | | ---- |
28 | | - |
29 | 30 | ## Usage Recommendations |
30 | 31 |
|
31 | 32 | Modern LLMs are already strong at **producing SQL queries without finetuning**. |
32 | 33 | We therefore recommend that most users: |
33 | 34 |
|
34 | 35 | 1. **Run inference** directly on the full benchmark: |
35 | | - - Use `dataset/questions.jsonl` (the main evaluation set). |
36 | | - - Generate SQL predictions with your LLM. |
37 | | - - Evaluate results against the benchmark. |
| 36 | + - Use [`llmsql.LLMSQLVLLMInference`](./llmsql/inference/inference.py) (the main inference class) for generation of SQL predictions with your LLM from HF. |
| 37 | + - Evaluate results against the benchmark with the [`llmsql.LLMSQLEvaluator`](./llmsql/evaluation/evaluator.py) evaluator class. |
38 | 38 |
|
39 | 39 | 2. **Optional finetuning**: |
40 | | - - For research or domain adaptation, we provide `train_questions.jsonl`, `val_questions.jsonl`, and `test_questions.jsonl`. |
41 | | - - Use the `finetune/` scripts if you want to adapt a base model. |
| 40 | + - For research or domain adaptation, we provide finetuning script for HF models. Use `llmsql finetune --help` or read [Finetune Readme](./llmsql/finetune/README.md) to find more about finetuning. |
42 | 41 |
|
| 42 | +> [!Tip] |
| 43 | +> You can find additional manuals in the README files of each folder([Inferece Readme](./llmsql/inference/README.md), [Evaluation Readme](./llmsql/evaluation/README.md), [Finetune Readme](./llmsql/finetune/README.md)) |
43 | 44 | --- |
44 | 45 |
|
45 | 46 | ## Repository Structure |
46 | 47 |
|
47 | 48 | ``` |
48 | 49 |
|
49 | 50 | WikiSQLv2/ |
50 | | -├── dataset/ # JSONL files (questions, tables, splits) |
51 | 51 | ├── evaluation/ # Scripts for downloading DB + evaluating predictions |
52 | 52 | ├── inference/ # Generate SQL queries with your LLM |
53 | | -├── finetune/ # Fine-tuning with TRL's SFTTrainer |
54 | | -├── outputs/ # Example location for your model outputs |
55 | | -└── utils/ # Shared helpers (prompt builders, logging, etc.) |
| 53 | +└── finetune/ # Fine-tuning with TRL's SFTTrainer |
| 54 | +
|
| 55 | +``` |
56 | 56 |
|
57 | | -```` |
58 | 57 |
|
59 | | ---- |
60 | 58 |
|
61 | 59 | ## Quickstart |
62 | 60 |
|
63 | | -## Install |
64 | 61 |
|
65 | | -Make sure you have the repo cloned (we used python3.11): |
| 62 | +### Install |
| 63 | + |
| 64 | +Make sure you have the package installed (we used python3.11): |
66 | 65 |
|
67 | 66 | ```bash |
68 | | -git clone https://github.com/LLMSQL/llmsql-benchmark.git |
69 | | -cd LLMSQL |
70 | | -pip3 install -r requirements.txt |
| 67 | +pip3 install llmsql |
71 | 68 | ``` |
72 | 69 |
|
73 | | -### 1. Download the Benchmark Database |
74 | | -```bash |
75 | | -python3 evaluation/download_db.py |
76 | | -```` |
| 70 | +### 1. Run Inference |
| 71 | + |
| 72 | +```python |
| 73 | +from llmsql import LLMSQLVLLMInference |
| 74 | + |
| 75 | +# Initialize inference engine |
| 76 | +inference = LLMSQLVLLMInference( |
| 77 | + model_name="Qwen/Qwen2.5-1.5B-Instruct", # or any Hugging Face causal LM |
| 78 | + tensor_parallel_size=1, |
| 79 | +) |
| 80 | + |
| 81 | +# Run generation |
| 82 | +results = inference.generate( |
| 83 | + output_file="path_to_your_outputs.jsonl", |
| 84 | + questions_path="data/questions.jsonl", |
| 85 | + tables_path="data/tables.jsonl", |
| 86 | + shots=5, |
| 87 | + batch_size=8, |
| 88 | + max_new_tokens=256, |
| 89 | + temperature=0.7, |
| 90 | +) |
| 91 | +``` |
77 | 92 |
|
78 | | -This will fetch `sqlite_tables.db` into `dataset/`. |
| 93 | +### 2. Evaluate Results |
79 | 94 |
|
80 | | -### 2. Run Inference |
| 95 | +```python |
| 96 | +from llmsql import LLMSQLEvaluator |
81 | 97 |
|
82 | | -```bash |
83 | | -python3 inference/inference.py \ |
84 | | - --questions_file dataset/questions.jsonl \ |
85 | | - --tables_file dataset/tables.jsonl \ |
86 | | - --output_file outputs/my_model_preds.jsonl \ |
87 | | - --model_name meta-llama/Llama-3.2-1B-Instruct \ |
88 | | - --shots 5 \ |
89 | | - --batch_size 16 |
| 98 | +evaluator = LLMSQLEvaluator(workdir_path="llmsql_workdir") |
| 99 | +report = evaluator.evaluate(outputs_path="path_to_your_outputs.jsonl") |
| 100 | +print(report) |
90 | 101 | ``` |
91 | 102 |
|
92 | | -### 3. Evaluate Results |
93 | | - |
94 | | -```bash |
95 | | -python3 evaluation/evaluate_answers.py \ |
96 | | - --pred_file outputs/my_model_preds.jsonl |
97 | | -``` |
98 | 103 |
|
99 | | ---- |
100 | 104 |
|
101 | 105 | ## Finetuning (Optional) |
102 | 106 |
|
103 | 107 | If you want to adapt a base model on LLMSQL: |
104 | 108 |
|
105 | 109 | ```bash |
106 | | -python3 finetune/finetune.py \ |
107 | | - --model_name_or_path meta-llama/Llama-3.2-1B-Instruct \ |
108 | | - --output_dir outputs/finetuned-llama \ |
109 | | - --train_file dataset/train_questions.jsonl \ |
110 | | - --val_file dataset/val_questions.jsonl \ |
111 | | - --tables_file dataset/tables.jsonl \ |
112 | | - --num_train_epochs 2 \ |
113 | | - --per_device_train_batch_size 1 |
| 110 | +llmsql finetune --config_file examples/example_finetune_args.yaml |
114 | 111 | ``` |
115 | 112 |
|
116 | | -This will train a model on the train/val splits and save it under `outputs/`. |
| 113 | +This will train a model on the train/val splits with the parameters provided in the config file. You can find example config file [here](./examples/example_finetune_args.yaml). |
| 114 | + |
117 | 115 |
|
118 | | ---- |
119 | 116 |
|
120 | 117 | ## Suggested Workflow |
121 | 118 |
|
122 | 119 | * **Primary**: Run inference on `dataset/questions.jsonl` → Evaluate with `evaluation/`. |
123 | 120 | * **Secondary (optional)**: Fine-tune on `train/val` → Test on `test_questions.jsonl`. |
124 | 121 |
|
125 | | ---- |
| 122 | + |
126 | 123 |
|
127 | 124 | ## License & Citation |
128 | 125 |
|
129 | | -This project builds on the original [WikiSQL](https://github.com/salesforce/WikiSQL) dataset. |
130 | 126 | Please cite LLMSQL if you use it in your work: |
131 | | -``` |
| 127 | +```text |
132 | 128 | @inproceedings{llmsql_bench, |
133 | 129 | title={LLMSQL: Upgrading WikiSQL for the LLM Era of Text-to-SQLels}, |
134 | 130 | author={Pihulski, Dzmitry and Charchut, Karol and Novogrodskaia, Viktoria and Koco{'n}, Jan}, |
135 | 131 | booktitle={2025 IEEE International Conference on Data Mining Workshops (ICDMW)}, |
136 | | - pages={...}, |
137 | 132 | year={2025}, |
138 | 133 | organization={IEEE} |
139 | 134 | } |
|
0 commit comments