You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: text_2_sql/README.md
+5-83Lines changed: 5 additions & 83 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,7 +4,9 @@ This portion of the repo contains code to implement a multi-shot approach to Tex
4
4
5
5
The sample provided works with Azure SQL Server, although it has been easily adapted to other SQL sources such as Snowflake.
6
6
7
-
**Three iterations on the approach are provided for SQL query generation. A prompt based approach and a two vector database based approaches. See Multi-Shot Approach for more details**
7
+
> [!IMPORTANT]
8
+
>
9
+
> - Previous versions of this approach have now been moved to `previous_iterations/semantic_kernel`. These will not be updated.
8
10
9
11
## High Level Workflow
10
12
@@ -35,13 +37,7 @@ To solve these issues, a Multi-Shot approach is developed. Below is the iteratio
35
37
36
38

37
39
38
-
Three different iterations are presented and code provided for:
39
-
-**Iteration 2:** Injection of a brief description of the available entities is injected into the prompt. This limits the number of tokens used and avoids filling the prompt with confusing schema information.
40
-
-**Iteration 3:** Indexing the entity definitions in a vector database, such as AI Search, and querying it to retrieve the most relevant entities for the key terms from the query.
41
-
-**Iteration 4:** Keeping an index of commonly asked questions and which schema / SQL query they resolve to - this index is generated by the LLM when it encounters a question that has not been previously asked. Additionally, indexing the entity definitions in a vector database, such as AI Search _(same as Iteration 3)_. First querying this index to see if a similar SQL query can be obtained _(if high probability of exact SQL query match, the results can be pre-fetched)_. If not, falling back to the schema index, and querying it to retrieve the most relevant entities for the key terms from the query.
42
-
-**Iteration 5:** Moves the Iteration 4 approach into a multi-agent approach for improved reasoning and query generation. With separation into agents, different agents can focus on one task only, and provide a better overall flow and response quality. See more details below.
43
-
44
-
All approaches limit the number of tokens used and avoids filling the prompt with confusing schema information.
40
+
Our approach has evolved as the system has matured into an multi-agent approach that brings improved reasoning, speed and instruction following capabilities. With separation into agents, different agents can focus on one task only, and provide a better overall flow and response quality.
45
41
46
42
Using Auto-Function calling capabilities, the LLM is able to retrieve from the plugin the full schema information for the views / tables that it considers useful for answering the question. Once retrieved, the full SQL query can then be generated. The schemas for multiple views / tables can be retrieved to allow the LLM to perform joins and other complex queries.
47
43
@@ -70,51 +66,6 @@ The cache strategy implementation is a simple way to prove that the system works
70
66
-**Positive Indication System:** Only update the cache when a user positively reacts to a question e.g. a thumbs up from the UI or doesn't ask a follow up question.
71
67
-**Always update:** Always add all questions into the cache when they are asked. The sample code in the repository currently implements this approach, but this could lead to poor SQL queries reaching the cache. One of the other caching strategies would be better production version.
72
68
73
-
### Comparison of Iterations
74
-
|| Common Text2SQL Approach | Prompt Based Multi-Shot Text2SQL Approach | Vector Based Multi-Shot Text2SQL Approach | Vector Based Multi-Shot Text2SQL Approach With Query Cache | Agentic Vector Based Multi-Shot Text2SQL Approach With Query Cache |
75
-
|-|-|-|-|-|-|
76
-
|**Advantages** | Fast for a limited number of entities. | Significant reduction in token usage. | Significant reduction in token usage. | Significant reduction in token usage.
77
-
|||| Scales well to multiple entities. | Scales well to multiple entities. | Scales well to multiple entities with small agents. |
78
-
|||| Uses a vector approach to detect the best fitting entity which is faster than using an LLM. Matching is offloaded to AI Search. | Uses a vector approach to detect the best fitting entity which is faster than using an LLM. Matching is offloaded to AI Search. | Uses a vector approach to detect the best fitting entity which is faster than using an LLM. Matching is offloaded to AI Search. |
79
-
||||| Significantly faster to answer similar questions as best fitting entity detection is skipped. Observed tests resulted in almost half the time for final output compared to the previous iteration. | Significantly faster to answer similar questions as best fitting entity detection is skipped. Observed tests resulted in almost half the time for final output compared to the previous iteration. |
80
-
||||| Significantly faster execution time for known questions. Total execution time can be reduced by skipping the query generation step. | Significantly faster execution time for known questions. Total execution time can be reduced by skipping the query generation step. |
81
-
|||||| Instruction following and accuracy is improved by decomposing the task into smaller tasks. |
82
-
|||||| Handles query decomposition for complex questions. |
83
-
|**Disadvantages**| Slows down significantly as the number of entities increases. | Uses LLM to detect the best fitting entity which is slow compared to a vector approach. | AI Search adds additional cost to the solution. | Slower than other approaches for the first time a question with no similar questions in the cache is asked. | Slower than other approaches for the first time a question with no similar questions in the cache is asked. |
84
-
|| Consumes a significant number of tokens as number of entities increases. | As number of entities increases, token usage will grow but at a lesser rate than Iteration 1. || AI Search adds additional cost to the solution. | AI Search and multiple agents adds additional cost to the solution. |
85
-
|| LLM struggled to differentiate which table to choose with the large amount of information passed. ||||
### Complete Execution Time Comparison for Approaches
92
-
93
-
To compare the different in complete execution time, the following questions were tested 25 times each for 4 different approaches.
94
-
95
-
Approaches:
96
-
- Prompt-based Multi-Shot (Iteration 2)
97
-
- Vector-Based Multi-Shot (Iteration 3)
98
-
- Vector-Based Multi-Shot with Query Cache (Iteration 4)
99
-
- Vector-Based Multi-shot with Pre Run Query Cache (Iteration 4)
100
-
101
-
Questions:
102
-
- What is the total revenue in June 2008?
103
-
- Give me the total number of orders in 2008?
104
-
- Which country did had the highest number of orders in June 2008?
105
-
106
-
The graph below shows the response times for the experimentation on a Known Question Set (i.e. the cache has already been populated with the query mapping by the LLM). gpt-4o was used as the completion LLM for this experiment. The response time is the complete execution time including:
107
-
108
-
- Prompt Preparation
109
-
- Question Understanding
110
-
- Cache Index Requests _(if applicable)_
111
-
- SQL Query Execution
112
-
- Interpretation and generation of answer in the correct format
113
-
114
-

115
-
116
-
The vector-based cache approaches consistently outperform those that just use a Prompt-Based or Vector-Based approach by a significant margin. Given that it is highly likely the same Text2SQL questions will be repeated often, storing the question-sql mapping leads to **significant performance increases** that are beneficial, despite the initial additional latency (between 1 - 2 seconds from testing) when a question is asked the first time.
117
-
118
69
## Sample Output
119
70
120
71
### What is the top performing product by quantity of units sold?
@@ -233,28 +184,9 @@ Below is a sample entry for a view / table that we which to expose to the LLM. T
233
184
234
185
See `./data_dictionary` for more details on how the data dictionary is structured and ways to **automatically generate it**.
235
186
236
-
## Prompt Based SQL Plugin (Iteration 2)
237
-
238
-
This approach works well for a small number of entities (tested on up to 20 entities with hundreds of columns). It performed well on the testing, with correct metadata, we achieved 100% accuracy on the test set.
239
-
240
-
Whilst a simple and high performing approach, the downside of this approach is the increase in number of tokens as the number of entities increases. Additionally, we found that the LLM started to get "confused" on which columns belong to which entities as the number of entities increased.
241
-
242
-
## Vector Based SQL Plugin (Iterations 3 & 4)
243
-
244
-
This approach allows the system to scale without significantly increasing the number of tokens used within the system prompt. Indexing and running an AI Search instance consumes additional cost, compared to the prompt based approach.
245
-
246
-
If the query cache is enabled, we used a vector search to find the similar previously asked questions and the queries / schemas they map to. In the case of a high probability of a match, the results can be pre-run with the stored query and passed to the LLM alongside the query. If the results can answer the question, query generation can be skipped all together, speeding up the total execution time.
247
-
248
-
In the case of an unknown question, there is a minor increase in latency but the query index cache could be pre-populated before it is released to users with common questions.
249
-
250
-
The following environmental variables control the behaviour of the Vector Based Text2SQL generation:
251
-
252
-
-**Text2Sql__UseQueryCache** - controls whether the query cached index is checked before using the standard schema index.
253
-
-**Text2Sql__PreRunQueryCache** - controls whether the top result from the query cache index (if enabled) is pre-fetched against the data source to include the results in the prompt.
254
-
255
187
## Agentic Vector Based Approach (Iteration 5)
256
188
257
-
This approach builds on the the Vector Based SQL Plugin approach, but adds a agentic approach to the solution.
189
+
This approach builds on the the Vector Based SQL Plugin approach that was previously developed, but adds a agentic approach to the solution.
258
190
259
191
This agentic system contains the following agents:
260
192
@@ -267,16 +199,6 @@ This agentic system contains the following agents:
267
199
268
200
The combination of this agent allows the system to answer complex questions, whilst staying under the token limits when including the database schemas. The query cache ensures that previously asked questions, can be answered quickly to avoid degrading user experience.
269
201
270
-
## Code Availability
271
-
272
-
|| Common Text2SQL Approach | Prompt Based Multi-Shot Text2SQL Approach | Vector Based Multi-Shot Text2SQL Approach | Vector Based Multi-Shot Text2SQL Approach With Query Cache | Agentic Vector Based Multi-Shot Text2SQL Approach With Query Cache |
0 commit comments