Skip to content

Commit d406516

Browse files
Merge pull request #3 from BugMaker-Boyan/refactor
Refactor
2 parents 3fc791f + 2a1b2f3 commit d406516

File tree

222 files changed

+3360
-143553
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

222 files changed

+3360
-143553
lines changed

.gitignore

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,7 @@
1-
**/data/dataset/**
2-
**/exp
1+
**/data
32
**/.vscode
4-
**/__pycache__
3+
**/__pycache__
4+
**/dist
5+
**/build
6+
**/*.egg-info
7+
**/tests

MANIFEST.in

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
include assets/*
2+
include examples/*

README.md

Lines changed: 74 additions & 77 deletions
Original file line numberDiff line numberDiff line change
@@ -2,142 +2,139 @@
22

33
<div align="center"><img width="25%" src="./assets/nl2sql360.png"><img width="75%" src="./assets/leaderboard.png"></div>
44

5-
6-
75
## :dizzy:Overview
86

97
**NL2SQL360** is a testbed for fine-grained evaluation of NL2SQL solutions. Our testbed integrates existing NL2SQL benchmarks, a repository of NL2SQL models, and various evaluation metrics, which aims to provide an intuitive and user-friendly platform to enable both standard and customized performance evaluations. Users can utilize **NL2SQL360** to assess different NL2SQL methods against established benchmarks or tailor their evaluations based on specific criteria. This flexibility allows for testing solutions in specific data domains or analyzing performance on different characteristics of SQL queries.
108

119
In addition, we propose **SuperSQL**, which achieves competitive performance with execution accuracy of **87%** and **62.66%** on the Spider and BIRD test sets, respectively.
1210

1311
## :tada:News
14-
[24/6/30] Our paper [The Dawn of Natural Language to SQL: Are We Fully Ready?](https://arxiv.org/abs/2406.01265) has been accepted by VLDB'24.
15-
16-
## :rocket:Quick start
1712

18-
We publish our online Web Demo based on Streamlit. **The more powerful online Web-System will be published soon.**
13+
[24/8/2] We have released CLI usage / Code usage tutorials. **Please [check out](#:rocket:quick-start)!**
1914

20-
Web demo: [Streamlit (hypersql.streamlit.app)](https://hypersql.streamlit.app/)
15+
[24/7/30] We have refactored the code and released the official python package([nl2sql360 · PyPI](https://pypi.org/project/nl2sql360)). **Stay tuned for the complete documents!**
2116

22-
## :zap:Environment Setup
17+
[24/6/30] Our paper [The Dawn of Natural Language to SQL: Are We Fully Ready?](https://arxiv.org/abs/2406.01265) has been accepted by VLDB'24.
2318

24-
Create a virtual anaconda environment:
19+
## :balloon:Features
2520

26-
```
27-
conda create -n nlsql360 python=3.9
28-
```
21+
- **Easy-to-use Evaluation**: Command Line Usage / Python Code Usage.
22+
- **Integrated Metrics**: Execution Accuracy / Exact-Match Accuracy / Valid Efficiency Score / Question Variance Testing.
23+
- **Multi-angle Performance**: Fine-grained performance (JOIN, Sub-query, etc.) / Scenario-based (Business Intelligence, etc.)
2924

30-
Active it and install the requirements:
25+
## :wrench:Installation
3126

27+
```bash
28+
pip install nl2sql360
3229
```
33-
pip install -r requirements.txt
34-
python -c "import nltk;nltk.download('punkt')"
35-
```
36-
37-
## :floppy_disk:Data Preparation
3830

39-
You need to download specific dataset and unzip to the folder `./data/dataset/{DATASET}`. For example, you can download and unzip the [Spider](https://yale-lily.github.io/spider) to the folder `./data/dataset/spider`.
31+
## :rocket:Quick Start
4032

41-
## :bulb:Evaluation With Only 3 Steps
33+
<details><summary>Prepare Dataset</summary>
4234

43-
##### 1. Create Dataset (e.g. Spider):
35+
Download NL2SQL dataset to `DATASET_DIR_PATH`. The directory structure should be like:
36+
```bash
37+
DATASET_DIR_PATH:
38+
├─database
39+
│ ├─academic
40+
│ │ ├─academic.sqlite
41+
│ ├─college
42+
│ │ ├─college.sqlite
43+
├─dev.json
44+
├─tables.json
45+
```
4446

45-
Note that, the evaluation results will be in the local SQLite Database (**"data/storage/nl2sql360.sqlite"**).
47+
- `database` directory contains multiple subdirectories, which include the corresponding `sqlite` database file.
48+
- `dev.json` is the samples file in JSON format, which at least contains three keys for `NL Question`, `Gold SQL`, `Databae Id`. You can also add the key for `Sample Complexity` for categorizing samples into different difficulty levels.
49+
- `tables.json` contains all database schema, following [Spider Preprocess Procedure](https://github.com/taoyds/spider/tree/master/preprocess). **You can also ignore this file if you do not want to evaluate Exact-Match Accuracy Metic.**
50+
- Note that the name for `database` directory, samples file `dev.json` and tables file `tables.json` can be changed.
4651

47-
```python
48-
from engine.engine import Engine
49-
from dataset.dataset_builder import SpiderDataset
50-
import os
52+
</details>
5153

52-
db_url = "sqlite:///data/storage/nl2sql360.sqlite"
53-
engine = Engine(db_url)
54+
<details><summary>Import Dataset into NL2SQL360</summary>
5455

55-
spider_dataset = SpiderDataset("data/dataset")
56-
engine.create_dataset_table(spider_dataset, "dev")
57-
raw_data = spider_dataset.get_raw_data("dev")
58-
engine.insert_dataset_table(spider_dataset, "dev", raw_data)
59-
```
56+
- CLI Usage:
6057

61-
##### 2. Automatic Evaluation with Specific Model Predicted SQLs File (e.g. `data/predict/spider_dev/DAILSQL_SC.sql`):
58+
- Create / Modify the YAML configuration following [NL2SQL360/examples/cli_examples/dataset_spider.yaml](https://github.com/BugMaker-Boyan/NL2SQL360/blob/refactor/examples/cli_examples/dataset_spider.yaml).
6259

63-
```python
64-
with open(os.path.join("data/predict/spider_dev/DAILSQL_SC.sql"), "r") as f:
65-
preds = [line.strip().split("\t")[0] for line in f.readlines()]
60+
- Save the YAML file to the path `DATASET_YAML_PATH`. Then run the command line:
6661

67-
eval_name = "DAILSQL_SC"
68-
engine.insert_evaluation_table(spider_dataset, "dev", eval_name, preds)
69-
```
62+
```bash
63+
nl2sql360 dataset DATASET_YAML_PATH
64+
```
7065

71-
##### 3. Multi-angle and Fine-grained Evaluation with Specific Scenarios:
66+
- Code Usage:
7267

73-
You can use different tools (or command lines) to access the local SQLite Database (**"data/storage/nl2sql360.sqlite"**). For example, use SQLiteStudio Software to visualize and interact with the database:
68+
- Create / Modify Python File following [NL2SQL360/examples/py_examples/dataset_import.py](https://github.com/BugMaker-Boyan/NL2SQL360/blob/refactor/examples/py_examples/dataset_import.py).
69+
- Run the python file to import dataset.
7470

75-
![SQLiteStudio](./assets/SQLiteStudio.png)
71+
</details>
7672

77-
There are two categories of tables:
73+
<details><summary>Evaluation NL2SQL Model</summary>
7874

79-
1. Dataset Table, e.g. `DATASET_spider_dev`, which contains all samples and analyzed characteristics (e.g. count_join).
80-
2. Evaluation Table, e.g. `DATASET_spider_dev_EVALUATION_DAILSQL_SC`, which contains specific model evaluation results (e.g. exec_acc).
75+
- CLI Usage:
8176

82-
**Use SQL query to get scenario-specific evaluation results, there are some examples below:**
77+
- Create / Modify the YAML configuration following [NL2SQL360/examples/cli_examples/evaluation.yaml](https://github.com/BugMaker-Boyan/NL2SQL360/blob/refactor/examples/cli_examples/evaluation.yaml).
8378

84-
```sql
85-
-- Get the overall EX performance of DAILSQL(SC) method in Spider-Dev dataset:
86-
SELECT round(avg(exec_acc), 2) AS EX FROM DATASET_spider_dev_EVALUATION_DAILSQL_SC AS e JOIN DATASET_spider_dev AS D ON e.id = d.id;
79+
- Save the YAML file to the path `DATASET_YAML_PATH`. Then run the command line:
8780

88-
-- Get the EX/EM/VES performance of DAILSQL(SC) method in Spider-Dev dataset with different hardness:
89-
SELECT hardness, round(avg(exec_acc), 2) AS EX, round(avg(exact_acc) * 100.0) AS EM, round(avg(ves), 2) AS VES FROM DATASET_spider_dev_EVALUATION_DAILSQL_SC AS e JOIN DATASET_spider_dev AS D ON e.id = d.id GROUP BY d.hardness;
81+
```bash
82+
nl2sql360 evaluate DATASET_YAML_PATH
83+
```
9084

91-
-- Get the EX performance of DAILSQL(SC) method in Spider-Dev dataset with JOIN keywords:
92-
SELECT round(avg(exec_acc), 2) AS EX FROM DATASET_spider_dev_EVALUATION_DAILSQL_SC AS e JOIN DATASET_spider_dev AS D ON e.id = d.id WHERE d.count_join > 0;
85+
- Code Usage:
9386

94-
-- Calculate the QVT performance of DAILSQL(SC) method in Spider-Dev dataset:
95-
SELECT AVG(exec_acc) as exec_acc FROM (
96-
SELECT AVG(exec_acc) as exec_acc FROM DATASET_spider_dev d JOIN DATASET_spider_dev_EVALUATION_DAILSQL_SC e ON d.id = e.id GROUP BY gold HAVING COUNT(d.gold) >= 2 and sum(e.exec_acc) != 0 ORDER BY d.gold
97-
);
98-
```
87+
- Create / Modify Python File following [NL2SQL360/examples/py_examples/evaluation.py](https://github.com/BugMaker-Boyan/NL2SQL360/blob/refactor/examples/py_examples/evaluation.py).
88+
- Run the python file to evaluate the model.
9989

100-
## :microscope:Experiments
90+
</details>
10191

102-
### Execution Accuracy vs. SQL Characteristics
92+
<details><summary>Query Multi-angle Performance</summary>
10393

104-
Our **NL2SQL360** supports sql query filtering based on individual sql clauses, their combinations, or user-defined conditions. We demonstrate only four representative aspects based on Spider-dev dataset. We run all methods on these four subsets of sql queries and compute the Execution Accuracy (EX) metric.
94+
- CLI Usage:
10595

106-
<div align="center"><img width="50%" src="./assets/Spider_Heatmap.png"><img width="50%" src="./assets/BIRD_Heatmap.png"></div>
96+
- Create / Modify the YAML configuration following [NL2SQL360/examples/cli_examples/report.yaml](https://github.com/BugMaker-Boyan/NL2SQL360/blob/refactor/examples/cli_examples/report.yaml).
10797

108-
![sql_charac](./assets/Boxplot.png)
98+
- Save the YAML file to the path `DATASET_YAML_PATH`. Then run the command line:
10999

110-
### Query Variance Testing
100+
```bash
101+
nl2sql360 report DATASET_YAML_PATH
102+
```
111103

112-
This set of experiments aims to evaluate the NL2SQL system’s adaptability to various natural language phrasings and structures, reflecting the diversity anticipated in practical applications. To this end, we evaluate different LLM-based and PLM-based methods on the Spider dataset. We use our proposed **Query Variance Testing (QVT)** metric for this evaluation. **There is no clear winner between LLM-based methods and PLM-based methods in QVT. Fine-tuning the model with task-specific datasets may help stabilize its performance against NL variations.**
104+
- The generated report will be in `save_path` specified in the YAML file.
113105

114-
<div align="center"><img width="40%" src="./assets/QVT_New.png"></div>
106+
- Code Usage:
107+
- Create / Modify Python File following [NL2SQL360/examples/py_examples/report.py](https://github.com/BugMaker-Boyan/NL2SQL360/blob/refactor/examples/py_examples/report.py).
108+
- Run the python file to generate report.
115109

116-
### Database Domain Adaption
110+
</details>
117111

118-
In practical NL2SQL applications, scenarios typically involve domain-specific databases, like movies or sports, each with unique schema designs and terminologies. Assessing the detailed performance of methods across these domains is crucial for effective model application. In this set of experiments, we classified the 140 databases in the Spider train set and the 20 databases in the development set into 33 domains, including social and geography, among others. We measured the performance of methods across different domain subsets in the Spider development set using the Execution Accuracy (EX) metric. **Different methods exhibit varying biases towards different domains, and there is no clear winner between LLM-based and PLM-based methods. However, in-domain training data during fine-tuning process is crucial for model performance in specific domains.**
112+
<details><summary>Delete History Cache</summary>
119113

120-
<div align="center"><img width="50%" src="./assets/DB_Domain_Heatmap.png"></div>
114+
- CLI Usage:
121115

122-
<div align="center"><img width="60%" src="./assets/DB_Domain_Boxplot.png"></div>
116+
- Create / Modify the YAML configuration following [NL2SQL360/examples/cli_examples/delete_history.yaml](https://github.com/BugMaker-Boyan/NL2SQL360/blob/refactor/examples/cli_examples/delete_history.yaml).
123117

124-
### More experiments
118+
- Save the YAML file to the path `DATASET_YAML_PATH`. Then run the command line:
125119

126-
Please refer to our paper [The Dawn of Natural Language to SQL: Are We Fully Ready?](https://arxiv.org/abs/2406.01265).
120+
```bash
121+
nl2sql360 delete DATASET_YAML_PATH
122+
```
127123

128-
## :memo:Released Experiments Data
124+
- Code Usage:
129125

130-
All predicted SQLs file: [NL2SQL360/data/predict](https://github.com/BugMaker-Boyan/NL2SQL360/tree/master/data/predict)
126+
- Create / Modify Python File following [NL2SQL360/examples/py_examples/delete_history.py](https://github.com/BugMaker-Boyan/NL2SQL360/blob/refactor/examples/py_examples/delete_history.py).
127+
- Run the python file to delete dataset / evaluation cache.
131128

132-
SQLite Database with all evaluation results : [NL2SQL360/data/storage/nl2sql360.sqlite](https://github.com/BugMaker-Boyan/NL2SQL360/blob/master/data/storage/nl2sql360.sqlite)
129+
</details>
133130

134131
## :dart:Road Map
135132

136133
:white_check_mark:Release **NL2SQL360** evaluation code.
137134

138135
:white_check_mark:Release **NL2SQL360** experiments data.
139136

140-
:clock10:Refactor **NL2SQL360** Web-System.
137+
:white_check_mark:Release **NL2SQL360** Official Python Package.
141138

142139
## :pushpin:Citation
143140

bird_example.py

Lines changed: 0 additions & 20 deletions
This file was deleted.

0 commit comments

Comments
 (0)