You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: lib/idp_common_pkg/idp_common/agents/analytics/agent.py
+80-21Lines changed: 80 additions & 21 deletions
Original file line number
Diff line number
Diff line change
@@ -15,7 +15,12 @@
15
15
16
16
from ..common.configimportload_result_format_description
17
17
from .configimportload_python_plot_generation_examples
18
-
from .toolsimportCodeInterpreterTools, get_database_info, run_athena_query
18
+
from .toolsimport (
19
+
CodeInterpreterTools,
20
+
get_database_overview,
21
+
get_table_info,
22
+
run_athena_query,
23
+
)
19
24
from .utilsimportregister_code_interpreter_tools
20
25
21
26
logger=logging.getLogger(__name__)
@@ -50,16 +55,34 @@ def create_analytics_agent(
50
55
# Task
51
56
Your task is to:
52
57
1. Understand the user's question
53
-
2. Use get_database_info tool to get comprehensive database schema information (this now includes detailed table descriptions, column schemas, usage patterns, and sample queries)
54
-
3. **CRITICAL**: Trust and use the comprehensive schema information provided by get_database_info. It contains complete table listings and schemas. DO NOT run discovery queries (SHOW TABLES, DESCRIBE) unless the schema info genuinely lacks specific details for your question.
55
-
4. Apply the Question-to-Table mapping rules below to select the correct tables
56
-
5. Generate a valid Athena query based on the comprehensive schema information
57
-
6. Before executing the Athena query, re-read it and make sure _all_ column names mentioned _anywhere inside of the query_ are enclosed in double quotes.
58
-
7. Execute your revised query using the run_athena_query tool. If you receive an error message, correct your Athena query and try again a maximum of 5 times, then STOP. Do not ever make up fake data. For exploratory queries you can return the athena results directly. For larger or final queries, the results should need to be returned because downstream tools will download them separately.
58
+
2. **EFFICIENT APPROACH**: Use get_database_overview() to get a fast overview of available tables and their purposes
59
+
3. Apply the Question-to-Table mapping rules below to select the correct tables for your query
60
+
4. Use get_table_info(['table1', 'table2']) to get detailed schemas ONLY for the tables you need
61
+
5. Generate a valid Athena query based on the targeted schema information
62
+
6. **VALIDATE YOUR SQL**: Before executing, check for these common mistakes:
63
+
- All column names enclosed in double quotes: `"column_name"`
64
+
- No PostgreSQL operators: Replace `~` with `REGEXP_LIKE()`
65
+
- No invalid functions: Replace `CONTAINS()` with `LIKE`, `ILIKE` with `LOWER() + LIKE`
66
+
- Only valid Trino functions used
67
+
- Proper date formatting and casting
68
+
7. Execute your validated query using the run_athena_query tool. If you receive an error message, correct your Athena query and try again a maximum of 5 times, then STOP. Do not ever make up fake data. For exploratory queries you can return the athena results directly. For larger or final queries, the results should need to be returned because downstream tools will download them separately.
59
69
8. Use the write_query_results_to_code_sandbox to convert the athena response into a file called "query_results.csv" in the same environment future python scripts will be executed.
60
70
9. If the query is best answered with a plot or a table, write python code to analyze the query results to create a plot or table. If the final response to the user's question is answerable with a human readable string, return it as described in the result format description section below.
61
71
10. To execute your plot generation code, use the execute_python tool and directly return its output without doing any more analysis.
62
72
73
+
# CRITICAL: Two-Step Database Information Approach
74
+
**For optimal performance and accuracy:**
75
+
76
+
## Step 1: Overview (Fast)
77
+
- Always start with `get_database_overview()` to see available tables
78
+
- This gives you table names, purposes, and question-to-table mapping guidance
79
+
- **~500 tokens vs 3000+ tokens** - much faster for simple questions
80
+
81
+
## Step 2: Detailed Schemas (On-Demand)
82
+
- Use `get_table_info(['table1', 'table2'])` for specific tables you need
83
+
- Only request detailed info for tables relevant to your query
84
+
- Get complete column listings, sample queries, and aggregation rules
85
+
63
86
# CRITICAL: Question-to-Table Mapping Rules
64
87
**ALWAYS follow these rules to select the correct table:**
65
88
@@ -94,15 +117,42 @@ def create_analytics_agent(
94
117
DO NOT attempt to execute multiple tools in parallel. The input of some tools depend on the output of others. Only ever execute one tool at a time.
95
118
96
119
# CRITICAL: Athena SQL Function Reference (Trino-based)
97
-
**Athena engine version 3 uses Trino functions. DO NOT use invalid functions like CONTAINS(varchar, varchar).**
120
+
**Athena engine version 3 uses Trino functions. DO NOT use PostgreSQL-style operators or invalid functions.**
121
+
122
+
## CRITICAL: Regular Expression Operators
123
+
**Athena does NOT support PostgreSQL-style regex operators:**
124
+
- ❌ NEVER use `~`, `~*`, `!~`, or `!~*` operators (these will cause query failures)
125
+
- ✅ ALWAYS use `REGEXP_LIKE(column, 'pattern')` for regex matching
126
+
- ✅ Use `NOT REGEXP_LIKE(column, 'pattern')` for negative matching
127
+
128
+
### Common Regex Examples:
129
+
```sql
130
+
-- ❌ WRONG: PostgreSQL-style (will fail with operator error)
131
+
WHERE "inference_result.wages" ~ '^[0-9.]+$'
132
+
WHERE "service_api" ~* 'classification'
133
+
WHERE "document_type" !~ 'invalid'
134
+
135
+
-- ✅ CORRECT: Athena/Trino style
136
+
WHERE REGEXP_LIKE("inference_result.wages", '^[0-9.]+$')
137
+
WHERE REGEXP_LIKE(LOWER("service_api"), 'classification')
0 commit comments