Skip to content

Commit d15a9b6

Browse files
committed
Update the prompt
1 parent 0eaaa98 commit d15a9b6

File tree

3 files changed

+94
-31
lines changed

3 files changed

+94
-31
lines changed

text_2_sql/autogen/src/autogen_text_2_sql/custom_agents/sql_schema_selection_agent.py

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -101,15 +101,19 @@ async def on_messages_stream(
101101
if schema not in final_schemas:
102102
final_schemas.append(schema)
103103

104-
final_colmns = []
104+
final_columns = []
105105
for column_value_result in column_value_results:
106106
for column in column_value_result:
107-
if column not in final_colmns:
108-
final_colmns.append(column)
107+
if column not in final_columns:
108+
final_columns.append(column)
109+
110+
all_column_lengths = [len(column) for column in final_columns]
109111

110112
final_results = {
113+
"MANDATORY_DISAMBIGUATION": max(all_column_lengths) > 3
114+
or len(final_columns) > 3,
111115
"schemas": final_schemas,
112-
"column_values": final_colmns,
116+
"column_values": final_columns,
113117
}
114118

115119
logging.info(f"Final results: {final_results}")

text_2_sql/text_2_sql_core/src/text_2_sql_core/connectors/ai_search.py

Lines changed: 21 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -146,7 +146,7 @@ async def get_column_values(
146146
"AIService__AzureSearchOptions__Text2SqlColumnValueStore__Index"
147147
],
148148
semantic_config=None,
149-
top=15,
149+
top=50,
150150
include_scores=False,
151151
minimum_score=5,
152152
)
@@ -161,6 +161,8 @@ async def get_column_values(
161161

162162
column_values[trimmed_fqn].append(value["Value"])
163163

164+
logging.info("Column Values: %s", column_values)
165+
164166
if as_json:
165167
return json.dumps(column_values, default=str)
166168
else:
@@ -225,6 +227,24 @@ async def get_entity_schemas(
225227

226228
# del schema["FQN"]
227229

230+
if (
231+
schema["CompleteEntityRelationshipsGraph"] is not None
232+
and len(schema["CompleteEntityRelationshipsGraph"]) == 0
233+
):
234+
del schema["CompleteEntityRelationshipsGraph"]
235+
236+
if (
237+
schema["SammpleValues"] is not None
238+
and len(schema["SammpleValues"]) == 0
239+
):
240+
del schema["SammpleValues"]
241+
242+
if (
243+
schema["EntityRelationships"] is not None
244+
and len(schema["EntityRelationships"]) == 0
245+
):
246+
del schema["EntityRelationships"]
247+
228248
if schema["Entity"].lower() not in excluded_entities:
229249
filtered_schemas.append(schema)
230250
else:

text_2_sql/text_2_sql_core/src/text_2_sql_core/prompts/sql_disambiguation_agent.yaml

Lines changed: 65 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -4,64 +4,83 @@ description:
44
"An agent that specialises in disambiguating the user's question and mapping it to database schemas. Use this agent when the user's question is ambiguous and requires more information to generate the SQL query."
55
system_message:
66
"<role_and_objective>
7-
You are a helpful AI Assistant specializing in disambiguating the user's question and mapping it to the relevant columns and schemas in the database.
7+
You are a helpful AI Assistant specializing in disambiguating the user's question and mapping it to the relevant columns and schemas in the database.
8+
Your job is to narrow down the possible mappings based on the user's question and the schema provided to generate a clear mapping.
89
</role_and_objective>
910
1011
<scope_of_user_query>
11-
The user's question will be related to {{ use_case }}.
12+
The user's question will be related to {{ use_case }}.
1213
</scope_of_user_query>
1314
1415
<instructions>
15-
- For every intent and filter condition in the question, map them to the columns in the schemas. Use the whole context of the question and information already provided to do so.
16+
- If 'MANDATORY_DISAMBIGUATION' is True, you must perform disambiguation on the terms with high cardinality. It is mandatory.
17+
18+
- For every intent and filter condition in the question, map them to the columns in the schemas and the appropriate filter value. Use the whole context of the question and information already provided to do so.
1619
1720
- Do not ask for information already included in the question, schema, or what can reasonably be inferred from the question.
1821
19-
- Only provide possible filter values for string columns. Do not provide possible filter values for Date and Numerical values as it should be clear from the question. Only ask a follow-up question for Date and Numerical values if you are unsure which column to use or what the value means, e.g., does 100 in currency refer to 100 USD or 100 EUR.
22+
- Only ask a follow-up question for Date and Numerical values if you are unsure which column to use or what the value means, e.g., does 100 in currency refer to 100 USD or 100 EUR.
2023
2124
<clear_context_handling>
22-
If the context of the question makes the mapping explicit, directly map the terms to the relevant column FQN without generating disambiguation questions.
25+
If the context of the question makes the mapping explicit, and the appropriate filter values can be found in 'column_values' directly map the terms to the relevant column FQN without generating disambiguation questions.
26+
27+
When evaluating questions:
28+
29+
Use the 'column_values' property to check for possible matching columns and compare these to the context of the question. ALWAYS CHECK THE 'column_values' PROPERTY THAT THE FILTER VALUE IS AVAILABLE.
2330
24-
Use the 'column_values' property to check for possible matching columns and compare these to the context of the question.
31+
If there are multiple values in 'column_values' that could match the filter, ask for clarification or to narrow down the filter value or column to use. If in doubt, use disambiguation questions to clarify.
2532
26-
When evaluating filters:
33+
Always consider the temporal and contextual phrases (e.g., \"June 2008\") in the question. If the context implies a direct match to a date column, do not request clarification unless multiple plausible columns exist.
34+
For geographical or categorical terms (e.g., \"country\"), prioritize unique matches or add context to narrow down ambiguities based on the schema.
2735
28-
Always consider the temporal and contextual phrases (e.g., \"June 2008\") in the question. If the context implies a direct match to a date column, do not request clarification unless multiple plausible columns exist.
29-
For geographical or categorical terms (e.g., \"country\"), prioritize unique matches or add context to narrow down ambiguities based on the schema.
3036
If all mappings are clear, output the JSON with mappings only.
3137
3238
<example>
3339
Question: \"What are the total number of sales within 2008 for the mountain bike product line?\"
34-
Output:
35-
json
36-
Copy code
3740
{
38-
\"mapping\": {
39-
\"Mountain Bike\": \"vProductModelCatalogDescription.Category\",
40-
\"2008\": \"SalesLT.SalesOrderHeader.OrderDate\"
41+
\"filter_mapping\": {
42+
\"bike\": [
43+
{
44+
\"column\": \"vProductModelCatalogDescription.Category\",
45+
\"filter_value\": \"Mountain Bike\"
46+
}
47+
],
48+
\"2008\": [
49+
{
50+
\"column\": \"SalesLT.SalesOrderHeader.OrderDate\",
51+
\"filter_value\": \"2008-01-01\",
52+
}
53+
]
54+
},
55+
\"intent_mapping\": {
56+
\"total number of sales\": \"SalesLT.SalesOrderHeader.SalesOrderID\"
4157
}
4258
}
4359
</example>
4460
</clear_context_handling>
4561
4662
<disambiguation_handling>
47-
If the term is ambiguous, there are multiple matching columns/filters, or the question lacks enough context to infer the correct mapping:
63+
If the term is ambiguous, there are multiple matching columns/questions in 'column_values', or the question lacks enough context to infer the correct mapping, then ask for clarification.
4864
49-
For ambiguous terms, evaluate the question context and schema relationships to narrow down matches.
50-
Populate the 'filters' field with the identified filter and relevant FQN, matching columns, and possible filter values.
51-
Include a clarification question in the 'question' field to request more information from the user.
52-
If the clarification is not related to a column or a filter value, populate the 'user_choices' field with the possible choices they can select.
65+
For ambiguous terms, evaluate the question context and schema relationships to narrow down matches.
66+
Populate the 'questions' field with the identified filter and relevant FQN, matching columns, and possible filter values.
67+
Include a clarification question in the 'question' field to request more information from the user.
68+
If the clarification is not related to a column or a filter value, populate the 'user_choices' field with the possible choices they can select.
5369
54-
Prioritize clear disambiguation based on:
55-
- Direct matches within schemas.
56-
- Additional context provided by the question (e.g., temporal, categorical, or domain-specific keywords).
70+
Prioritize clear disambiguation based on:
71+
- Direct matches within schemas.
72+
- Additional context provided by the question (e.g., temporal, categorical, or domain-specific keywords).
73+
74+
Return all disambiguation questions in the 'questions' array. If multiple disambiguation questions are needed, include them all in the 'questions' array at once.
5775
5876
<example>
59-
User question: \"What country did we sell the most to in June 2008?\"
77+
User question: \"What country did we sell the most in June 2008?\"
6078
Schema contains multiple columns potentially related to \"country.\"
6179
6280
If disambiguation is needed:
81+
6382
{
64-
\"filters\": [
83+
\"questions\": [
6584
{
6685
\"question\": \"What do you mean by 'country'?\",
6786
\"matching_columns\": [
@@ -74,7 +93,27 @@ system_message:
7493
]
7594
}
7695
</example>
77-
Always include either the 'matching_columns', 'matching_filter_values' or `user_choices` field in the 'filters' array.
96+
97+
<example 2>
98+
User question: \"What are the total sales for the mountain bike product line?\"
99+
'column_values' contains multiple columns potentially related to \"mountain bike.\"
100+
101+
If disambiguation is needed:
102+
{
103+
\"questions\": [
104+
{
105+
\"question\": \"What do you mean by 'mountain bike'?\",
106+
\"matching_columns\": [
107+
\"vProductModelCatalogDescription.Category\",
108+
\"vProductModelCatalogDescription.ProductLine\"
109+
],
110+
\"matching_filter_values\": [],
111+
\"user_choices\": []
112+
}
113+
]
114+
}
115+
</example>
116+
Always include either the 'matching_columns', 'matching_filter_values' or `user_choices` field in the 'questions' array.
78117
</disambiguation_handling>
79118
</instructions>
80119

0 commit comments

Comments
 (0)