|
2 | 2 |
|
3 | 3 | <div align="center"><img width="25%" src="./assets/nl2sql360.png"><img width="75%" src="./assets/leaderboard.png"></div> |
4 | 4 |
|
5 | | - |
6 | | - |
7 | 5 | ## :dizzy:Overview |
8 | 6 |
|
9 | 7 | **NL2SQL360** is a testbed for fine-grained evaluation of NL2SQL solutions. Our testbed integrates existing NL2SQL benchmarks, a repository of NL2SQL models, and various evaluation metrics, which aims to provide an intuitive and user-friendly platform to enable both standard and customized performance evaluations. Users can utilize **NL2SQL360** to assess different NL2SQL methods against established benchmarks or tailor their evaluations based on specific criteria. This flexibility allows for testing solutions in specific data domains or analyzing performance on different characteristics of SQL queries. |
10 | 8 |
|
11 | 9 | In addition, we propose **SuperSQL**, which achieves competitive performance with execution accuracy of **87%** and **62.66%** on the Spider and BIRD test sets, respectively. |
12 | 10 |
|
13 | 11 | ## :tada:News |
14 | | -[24/6/30] Our paper [The Dawn of Natural Language to SQL: Are We Fully Ready?](https://arxiv.org/abs/2406.01265) has been accepted by VLDB'24. |
15 | | - |
16 | | -## :rocket:Quick start |
17 | 12 |
|
18 | | -We publish our online Web Demo based on Streamlit. **The more powerful online Web-System will be published soon.** |
| 13 | +[24/8/2] We have released CLI usage / Code usage tutorials. **Please [check out](#:rocket:quick-start)!** |
19 | 14 |
|
20 | | -Web demo: [Streamlit (hypersql.streamlit.app)](https://hypersql.streamlit.app/) |
| 15 | +[24/7/30] We have refactored the code and released the official python package([nl2sql360 · PyPI](https://pypi.org/project/nl2sql360)). **Stay tuned for the complete documents!** |
21 | 16 |
|
22 | | -## :zap:Environment Setup |
| 17 | +[24/6/30] Our paper [The Dawn of Natural Language to SQL: Are We Fully Ready?](https://arxiv.org/abs/2406.01265) has been accepted by VLDB'24. |
23 | 18 |
|
24 | | -Create a virtual anaconda environment: |
| 19 | +## :balloon:Features |
25 | 20 |
|
26 | | -``` |
27 | | -conda create -n nlsql360 python=3.9 |
28 | | -``` |
| 21 | +- **Easy-to-use Evaluation**: Command Line Usage / Python Code Usage. |
| 22 | +- **Integrated Metrics**: Execution Accuracy / Exact-Match Accuracy / Valid Efficiency Score / Question Variance Testing. |
| 23 | +- **Multi-angle Performance**: Fine-grained performance (JOIN, Sub-query, etc.) / Scenario-based (Business Intelligence, etc.) |
29 | 24 |
|
30 | | -Active it and install the requirements: |
| 25 | +## :wrench:Installation |
31 | 26 |
|
| 27 | +```bash |
| 28 | +pip install nl2sql360 |
32 | 29 | ``` |
33 | | -pip install -r requirements.txt |
34 | | -python -c "import nltk;nltk.download('punkt')" |
35 | | -``` |
36 | | - |
37 | | -## :floppy_disk:Data Preparation |
38 | 30 |
|
39 | | -You need to download specific dataset and unzip to the folder `./data/dataset/{DATASET}`. For example, you can download and unzip the [Spider](https://yale-lily.github.io/spider) to the folder `./data/dataset/spider`. |
| 31 | +## :rocket:Quick Start |
40 | 32 |
|
41 | | -## :bulb:Evaluation With Only 3 Steps |
| 33 | +<details><summary>Prepare Dataset</summary> |
42 | 34 |
|
43 | | -##### 1. Create Dataset (e.g. Spider): |
| 35 | +Download NL2SQL dataset to `DATASET_DIR_PATH`. The directory structure should be like: |
| 36 | +```bash |
| 37 | +DATASET_DIR_PATH: |
| 38 | +├─database |
| 39 | +│ ├─academic |
| 40 | +│ │ ├─academic.sqlite |
| 41 | +│ ├─college |
| 42 | +│ │ ├─college.sqlite |
| 43 | +├─dev.json |
| 44 | +├─tables.json |
| 45 | +``` |
44 | 46 |
|
45 | | -Note that, the evaluation results will be in the local SQLite Database (**"data/storage/nl2sql360.sqlite"**). |
| 47 | +- `database` directory contains multiple subdirectories, which include the corresponding `sqlite` database file. |
| 48 | +- `dev.json` is the samples file in JSON format, which at least contains three keys for `NL Question`, `Gold SQL`, `Databae Id`. You can also add the key for `Sample Complexity` for categorizing samples into different difficulty levels. |
| 49 | +- `tables.json` contains all database schema, following [Spider Preprocess Procedure](https://github.com/taoyds/spider/tree/master/preprocess). **You can also ignore this file if you do not want to evaluate Exact-Match Accuracy Metic.** |
| 50 | +- Note that the name for `database` directory, samples file `dev.json` and tables file `tables.json` can be changed. |
46 | 51 |
|
47 | | -```python |
48 | | - from engine.engine import Engine |
49 | | -from dataset.dataset_builder import SpiderDataset |
50 | | -import os |
| 52 | +</details> |
51 | 53 |
|
52 | | -db_url = "sqlite:///data/storage/nl2sql360.sqlite" |
53 | | -engine = Engine(db_url) |
| 54 | +<details><summary>Import Dataset into NL2SQL360</summary> |
54 | 55 |
|
55 | | -spider_dataset = SpiderDataset("data/dataset") |
56 | | -engine.create_dataset_table(spider_dataset, "dev") |
57 | | -raw_data = spider_dataset.get_raw_data("dev") |
58 | | -engine.insert_dataset_table(spider_dataset, "dev", raw_data) |
59 | | -``` |
| 56 | +- CLI Usage: |
60 | 57 |
|
61 | | -##### 2. Automatic Evaluation with Specific Model Predicted SQLs File (e.g. `data/predict/spider_dev/DAILSQL_SC.sql`): |
| 58 | + - Create / Modify the YAML configuration following [NL2SQL360/examples/cli_examples/dataset_spider.yaml](https://github.com/BugMaker-Boyan/NL2SQL360/blob/refactor/examples/cli_examples/dataset_spider.yaml). |
62 | 59 |
|
63 | | -```python |
64 | | -with open(os.path.join("data/predict/spider_dev/DAILSQL_SC.sql"), "r") as f: |
65 | | - preds = [line.strip().split("\t")[0] for line in f.readlines()] |
| 60 | + - Save the YAML file to the path `DATASET_YAML_PATH`. Then run the command line: |
66 | 61 |
|
67 | | -eval_name = "DAILSQL_SC" |
68 | | -engine.insert_evaluation_table(spider_dataset, "dev", eval_name, preds) |
69 | | -``` |
| 62 | + ```bash |
| 63 | + nl2sql360 dataset DATASET_YAML_PATH |
| 64 | + ``` |
70 | 65 |
|
71 | | -##### 3. Multi-angle and Fine-grained Evaluation with Specific Scenarios: |
| 66 | +- Code Usage: |
72 | 67 |
|
73 | | -You can use different tools (or command lines) to access the local SQLite Database (**"data/storage/nl2sql360.sqlite"**). For example, use SQLiteStudio Software to visualize and interact with the database: |
| 68 | + - Create / Modify Python File following [NL2SQL360/examples/py_examples/dataset_import.py](https://github.com/BugMaker-Boyan/NL2SQL360/blob/refactor/examples/py_examples/dataset_import.py). |
| 69 | + - Run the python file to import dataset. |
74 | 70 |
|
75 | | - |
| 71 | +</details> |
76 | 72 |
|
77 | | -There are two categories of tables: |
| 73 | +<details><summary>Evaluation NL2SQL Model</summary> |
78 | 74 |
|
79 | | -1. Dataset Table, e.g. `DATASET_spider_dev`, which contains all samples and analyzed characteristics (e.g. count_join). |
80 | | -2. Evaluation Table, e.g. `DATASET_spider_dev_EVALUATION_DAILSQL_SC`, which contains specific model evaluation results (e.g. exec_acc). |
| 75 | +- CLI Usage: |
81 | 76 |
|
82 | | -**Use SQL query to get scenario-specific evaluation results, there are some examples below:** |
| 77 | + - Create / Modify the YAML configuration following [NL2SQL360/examples/cli_examples/evaluation.yaml](https://github.com/BugMaker-Boyan/NL2SQL360/blob/refactor/examples/cli_examples/evaluation.yaml). |
83 | 78 |
|
84 | | -```sql |
85 | | --- Get the overall EX performance of DAILSQL(SC) method in Spider-Dev dataset: |
86 | | -SELECT round(avg(exec_acc), 2) AS EX FROM DATASET_spider_dev_EVALUATION_DAILSQL_SC AS e JOIN DATASET_spider_dev AS D ON e.id = d.id; |
| 79 | + - Save the YAML file to the path `DATASET_YAML_PATH`. Then run the command line: |
87 | 80 |
|
88 | | --- Get the EX/EM/VES performance of DAILSQL(SC) method in Spider-Dev dataset with different hardness: |
89 | | -SELECT hardness, round(avg(exec_acc), 2) AS EX, round(avg(exact_acc) * 100.0) AS EM, round(avg(ves), 2) AS VES FROM DATASET_spider_dev_EVALUATION_DAILSQL_SC AS e JOIN DATASET_spider_dev AS D ON e.id = d.id GROUP BY d.hardness; |
| 81 | + ```bash |
| 82 | + nl2sql360 evaluate DATASET_YAML_PATH |
| 83 | + ``` |
90 | 84 |
|
91 | | --- Get the EX performance of DAILSQL(SC) method in Spider-Dev dataset with JOIN keywords: |
92 | | -SELECT round(avg(exec_acc), 2) AS EX FROM DATASET_spider_dev_EVALUATION_DAILSQL_SC AS e JOIN DATASET_spider_dev AS D ON e.id = d.id WHERE d.count_join > 0; |
| 85 | +- Code Usage: |
93 | 86 |
|
94 | | --- Calculate the QVT performance of DAILSQL(SC) method in Spider-Dev dataset: |
95 | | -SELECT AVG(exec_acc) as exec_acc FROM ( |
96 | | - SELECT AVG(exec_acc) as exec_acc FROM DATASET_spider_dev d JOIN DATASET_spider_dev_EVALUATION_DAILSQL_SC e ON d.id = e.id GROUP BY gold HAVING COUNT(d.gold) >= 2 and sum(e.exec_acc) != 0 ORDER BY d.gold |
97 | | -); |
98 | | -``` |
| 87 | + - Create / Modify Python File following [NL2SQL360/examples/py_examples/evaluation.py](https://github.com/BugMaker-Boyan/NL2SQL360/blob/refactor/examples/py_examples/evaluation.py). |
| 88 | + - Run the python file to evaluate the model. |
99 | 89 |
|
100 | | -## :microscope:Experiments |
| 90 | +</details> |
101 | 91 |
|
102 | | -### Execution Accuracy vs. SQL Characteristics |
| 92 | +<details><summary>Query Multi-angle Performance</summary> |
103 | 93 |
|
104 | | -Our **NL2SQL360** supports sql query filtering based on individual sql clauses, their combinations, or user-defined conditions. We demonstrate only four representative aspects based on Spider-dev dataset. We run all methods on these four subsets of sql queries and compute the Execution Accuracy (EX) metric. |
| 94 | +- CLI Usage: |
105 | 95 |
|
106 | | -<div align="center"><img width="50%" src="./assets/Spider_Heatmap.png"><img width="50%" src="./assets/BIRD_Heatmap.png"></div> |
| 96 | + - Create / Modify the YAML configuration following [NL2SQL360/examples/cli_examples/report.yaml](https://github.com/BugMaker-Boyan/NL2SQL360/blob/refactor/examples/cli_examples/report.yaml). |
107 | 97 |
|
108 | | - |
| 98 | + - Save the YAML file to the path `DATASET_YAML_PATH`. Then run the command line: |
109 | 99 |
|
110 | | -### Query Variance Testing |
| 100 | + ```bash |
| 101 | + nl2sql360 report DATASET_YAML_PATH |
| 102 | + ``` |
111 | 103 |
|
112 | | -This set of experiments aims to evaluate the NL2SQL system’s adaptability to various natural language phrasings and structures, reflecting the diversity anticipated in practical applications. To this end, we evaluate different LLM-based and PLM-based methods on the Spider dataset. We use our proposed **Query Variance Testing (QVT)** metric for this evaluation. **There is no clear winner between LLM-based methods and PLM-based methods in QVT. Fine-tuning the model with task-specific datasets may help stabilize its performance against NL variations.** |
| 104 | + - The generated report will be in `save_path` specified in the YAML file. |
113 | 105 |
|
114 | | -<div align="center"><img width="40%" src="./assets/QVT_New.png"></div> |
| 106 | +- Code Usage: |
| 107 | + - Create / Modify Python File following [NL2SQL360/examples/py_examples/report.py](https://github.com/BugMaker-Boyan/NL2SQL360/blob/refactor/examples/py_examples/report.py). |
| 108 | + - Run the python file to generate report. |
115 | 109 |
|
116 | | -### Database Domain Adaption |
| 110 | +</details> |
117 | 111 |
|
118 | | -In practical NL2SQL applications, scenarios typically involve domain-specific databases, like movies or sports, each with unique schema designs and terminologies. Assessing the detailed performance of methods across these domains is crucial for effective model application. In this set of experiments, we classified the 140 databases in the Spider train set and the 20 databases in the development set into 33 domains, including social and geography, among others. We measured the performance of methods across different domain subsets in the Spider development set using the Execution Accuracy (EX) metric. **Different methods exhibit varying biases towards different domains, and there is no clear winner between LLM-based and PLM-based methods. However, in-domain training data during fine-tuning process is crucial for model performance in specific domains.** |
| 112 | +<details><summary>Delete History Cache</summary> |
119 | 113 |
|
120 | | -<div align="center"><img width="50%" src="./assets/DB_Domain_Heatmap.png"></div> |
| 114 | +- CLI Usage: |
121 | 115 |
|
122 | | -<div align="center"><img width="60%" src="./assets/DB_Domain_Boxplot.png"></div> |
| 116 | + - Create / Modify the YAML configuration following [NL2SQL360/examples/cli_examples/delete_history.yaml](https://github.com/BugMaker-Boyan/NL2SQL360/blob/refactor/examples/cli_examples/delete_history.yaml). |
123 | 117 |
|
124 | | -### More experiments |
| 118 | + - Save the YAML file to the path `DATASET_YAML_PATH`. Then run the command line: |
125 | 119 |
|
126 | | -Please refer to our paper [The Dawn of Natural Language to SQL: Are We Fully Ready?](https://arxiv.org/abs/2406.01265). |
| 120 | + ```bash |
| 121 | + nl2sql360 delete DATASET_YAML_PATH |
| 122 | + ``` |
127 | 123 |
|
128 | | -## :memo:Released Experiments Data |
| 124 | +- Code Usage: |
129 | 125 |
|
130 | | -All predicted SQLs file: [NL2SQL360/data/predict](https://github.com/BugMaker-Boyan/NL2SQL360/tree/master/data/predict) |
| 126 | + - Create / Modify Python File following [NL2SQL360/examples/py_examples/delete_history.py](https://github.com/BugMaker-Boyan/NL2SQL360/blob/refactor/examples/py_examples/delete_history.py). |
| 127 | + - Run the python file to delete dataset / evaluation cache. |
131 | 128 |
|
132 | | -SQLite Database with all evaluation results : [NL2SQL360/data/storage/nl2sql360.sqlite](https://github.com/BugMaker-Boyan/NL2SQL360/blob/master/data/storage/nl2sql360.sqlite) |
| 129 | +</details> |
133 | 130 |
|
134 | 131 | ## :dart:Road Map |
135 | 132 |
|
136 | 133 | :white_check_mark:Release **NL2SQL360** evaluation code. |
137 | 134 |
|
138 | 135 | :white_check_mark:Release **NL2SQL360** experiments data. |
139 | 136 |
|
140 | | -:clock10:Refactor **NL2SQL360** Web-System. |
| 137 | +:white_check_mark:Release **NL2SQL360** Official Python Package. |
141 | 138 |
|
142 | 139 | ## :pushpin:Citation |
143 | 140 |
|
|
0 commit comments