You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To power the knowledge of the LLM, a data dictionary containing all the SQL views / table metadata is used. Whilst the LLM could query the database at runtime to find out the schemas for the database, storing them in a text file reduces the overall latency of the system and allows the metadata for each table to be adjusted in a form of prompt engineering.
6
6
7
7
Below is a sample entry for a view / table that we which to expose to the LLM. The Microsoft SQL Server [Adventure Works Database](https://learn.microsoft.com/en-us/sql/samples/adventureworks-install-configure?view=sql-server-ver16) is used as an sample.
8
8
9
9
```json
10
10
{
11
-
"Entity": "SalesLT.SalesOrderDetail",
12
-
"Definition": "The SalesLT.SalesOrderDetail entity contains detailed information about individual items within sales orders. This entity includes data on the sales order ID, the specific details of each order item such as quantity, product ID, unit price, and any discounts applied. It also includes calculated fields such as the line total for each order item. This entity can be used to answer questions related to the specifics of sales transactions, such as which products were purchased in each order, the quantity of each product ordered, and the total price of each order item.",
"Definition": "The SalesOrderID column in the SalesLT.SalesOrderDetail entity contains unique numerical identifiers for each sales order. Each value represents a specific sales order, ensuring that each order can be individually referenced and tracked. The values are in a sequential numeric format, indicating the progression and uniqueness of each sales transaction within the database.",
47
-
"AllowedValues": null,
56
+
"Definition": null,
48
57
"SampleValues": [
49
-
71938,
50
-
71784,
51
-
71935,
52
-
71923,
58
+
71898,
59
+
71831,
60
+
71899,
61
+
71796,
53
62
71946
54
63
]
55
64
},
56
65
{
57
66
"Name": "SalesOrderDetailID",
58
67
"DataType": "int",
59
-
"Definition": "The SalesOrderDetailID column in the SalesLT.SalesOrderDetail entity contains unique identifier values for each sales order detail record. The values are numeric and are used to distinguish each order detail entry within the database. These identifiers are essential for maintaining data integrity and enabling efficient querying and data manipulation within the sales order system.",
@@ -85,6 +174,32 @@ Below is a sample entry for a view / table that we which to expose to the LLM. T
85
174
86
175
A full data dictionary must be built for all the views / tables you which to expose to the LLM. The metadata provide directly influences the accuracy of the Text2SQL component.
87
176
177
+
## Column Value Store JSONL
178
+
179
+
To aid LLM understand, the dimension tables within a star schema are indexed if they contain 'string' based values. This allows the LLM to use search to understand the context of the question asked. e.g. If a user asks 'What are the total sales on VE-C304-S', we can use search to determine that 'VE-C304-S' is in fact a Product Number and which entity it belongs to.
180
+
181
+
This avoids having to index the fact tables, saving storage, and allows us to still use the SQL queries to slice and dice the data accordingly.
0 commit comments