You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Merge branch 'feature/optimize-agent-analytics-metadata' into 'develop'
Optimize analytics agent by embedding database overview in system prompt
See merge request genaiic-reusable-assets/engagement-artifacts/genaiic-idp-accelerator!324
Copy file name to clipboardExpand all lines: CHANGELOG.md
+6-5Lines changed: 6 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,11 +6,12 @@ SPDX-License-Identifier: MIT-0
6
6
## [Unreleased]
7
7
8
8
### Added
9
-
-**Analytics Agent 2-Phase Schema Optimization for Improved Performance**
10
-
- Implemented progressive schema disclosure system with efficient 2-phase approach
11
-
- Phase 1: `get_database_overview()` provides fast table listing and guidance (~500 tokens vs 3000+ tokens - 6x faster)
12
-
- Phase 2: `get_table_info(['specific_tables'])` loads detailed schemas only for tables actually needed by the query
13
-
- Enhanced SQL guidance with comprehensive Athena/Trino function reference and PostgreSQL operator warnings to prevent common query failures
9
+
-**Analytics Agent Schema Optimization for Improved Performance**
10
+
-**Embedded Database Overview**: Complete table listing and guidance embedded directly in system prompt (no tool call needed)
11
+
-**On-Demand Detailed Schemas**: `get_table_info(['specific_tables'])` loads detailed column information only for tables actually needed by the query
12
+
-**Significant Performance Gains**: Eliminates redundant tool calls on every query while maintaining token efficiency
13
+
-**Enhanced SQL Guidance**: Comprehensive Athena/Trino function reference with explicit PostgreSQL operator warnings to prevent common query failures like `~` regex operator mistakes
14
+
-**Faster Time-to-Query**: Agent has immediate access to table overview and can proceed directly to detailed schema loading for relevant tables
14
15
15
16
### Fixed
16
17
- Fix missing data in Glue tables when using a document class that contains a dash (-).
# Load database overview once during agent creation for embedding in system prompt
52
+
database_overview=_get_database_overview()
53
+
51
54
# Define the system prompt for the analytics agent
52
55
system_prompt=f"""
53
56
You are an AI agent that converts natural language questions into Athena queries, executes those queries, and writes python code to convert the query results into json representing either a plot, a table, or a string.
54
57
55
58
# Task
56
59
Your task is to:
57
60
1. Understand the user's question
58
-
2. **EFFICIENT APPROACH**: Use get_database_overview() to get a fast overview of available tables and their purposes
61
+
2. **EFFICIENT APPROACH**: Review the database overview below to see available tables and their purposes
59
62
3. Apply the Question-to-Table mapping rules below to select the correct tables for your query
60
63
4. Use get_table_info(['table1', 'table2']) to get detailed schemas ONLY for the tables you need
61
64
5. Generate a valid Athena query based on the targeted schema information
@@ -70,15 +73,18 @@ def create_analytics_agent(
70
73
9. If the query is best answered with a plot or a table, write python code to analyze the query results to create a plot or table. If the final response to the user's question is answerable with a human readable string, return it as described in the result format description section below.
71
74
10. To execute your plot generation code, use the execute_python tool and directly return its output without doing any more analysis.
72
75
73
-
# CRITICAL: Two-Step Database Information Approach
76
+
# Database Overview - Available Tables
77
+
{database_overview}
78
+
79
+
# CRITICAL: Optimized Database Information Approach
74
80
**For optimal performance and accuracy:**
75
81
76
-
## Step 1: Overview (Fast)
77
-
- Always start with `get_database_overview()` to see available tables
82
+
## Step 1: Review Database Overview (Above)
83
+
- The complete database overview is provided above in this prompt
78
84
- This gives you table names, purposes, and question-to-table mapping guidance
79
-
- **~500 tokens vs 3000+ tokens** - much faster for simple questions
85
+
- No tool call needed - information is immediately available
80
86
81
-
## Step 2: Detailed Schemas (On-Demand)
87
+
## Step 2: Get Detailed Schemas (On-Demand Only)
82
88
- Use `get_table_info(['table1', 'table2'])` for specific tables you need
83
89
- Only request detailed info for tables relevant to your query
84
90
- Get complete column listings, sample queries, and aggregation rules
0 commit comments