Skip to content

Commit 7e4a2ba

Browse files
committed
Merge branch 'feature/optimize-agent-analytics-metadata' into 'develop'
Optimize analytics agent by embedding database overview in system prompt See merge request genaiic-reusable-assets/engagement-artifacts/genaiic-idp-accelerator!324
2 parents b893855 + e112b7e commit 7e4a2ba

File tree

2 files changed

+19
-13
lines changed

2 files changed

+19
-13
lines changed

CHANGELOG.md

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -6,11 +6,12 @@ SPDX-License-Identifier: MIT-0
66
## [Unreleased]
77

88
### Added
9-
- **Analytics Agent 2-Phase Schema Optimization for Improved Performance**
10-
- Implemented progressive schema disclosure system with efficient 2-phase approach
11-
- Phase 1: `get_database_overview()` provides fast table listing and guidance (~500 tokens vs 3000+ tokens - 6x faster)
12-
- Phase 2: `get_table_info(['specific_tables'])` loads detailed schemas only for tables actually needed by the query
13-
- Enhanced SQL guidance with comprehensive Athena/Trino function reference and PostgreSQL operator warnings to prevent common query failures
9+
- **Analytics Agent Schema Optimization for Improved Performance**
10+
- **Embedded Database Overview**: Complete table listing and guidance embedded directly in system prompt (no tool call needed)
11+
- **On-Demand Detailed Schemas**: `get_table_info(['specific_tables'])` loads detailed column information only for tables actually needed by the query
12+
- **Significant Performance Gains**: Eliminates redundant tool calls on every query while maintaining token efficiency
13+
- **Enhanced SQL Guidance**: Comprehensive Athena/Trino function reference with explicit PostgreSQL operator warnings to prevent common query failures like `~` regex operator mistakes
14+
- **Faster Time-to-Query**: Agent has immediate access to table overview and can proceed directly to detailed schema loading for relevant tables
1415

1516
### Fixed
1617
- Fix missing data in Glue tables when using a document class that contains a dash (-).

lib/idp_common_pkg/idp_common/agents/analytics/agent.py

Lines changed: 13 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -15,9 +15,9 @@
1515
from ..common.config import load_result_format_description
1616
from ..common.strands_bedrock_model import create_strands_bedrock_model
1717
from .config import load_python_plot_generation_examples
18+
from .schema_provider import get_database_overview as _get_database_overview
1819
from .tools import (
1920
CodeInterpreterTools,
20-
get_database_overview,
2121
get_table_info,
2222
run_athena_query,
2323
)
@@ -48,14 +48,17 @@ def create_analytics_agent(
4848
# Load python code examples
4949
python_plot_generation_examples = load_python_plot_generation_examples()
5050

51+
# Load database overview once during agent creation for embedding in system prompt
52+
database_overview = _get_database_overview()
53+
5154
# Define the system prompt for the analytics agent
5255
system_prompt = f"""
5356
You are an AI agent that converts natural language questions into Athena queries, executes those queries, and writes python code to convert the query results into json representing either a plot, a table, or a string.
5457
5558
# Task
5659
Your task is to:
5760
1. Understand the user's question
58-
2. **EFFICIENT APPROACH**: Use get_database_overview() to get a fast overview of available tables and their purposes
61+
2. **EFFICIENT APPROACH**: Review the database overview below to see available tables and their purposes
5962
3. Apply the Question-to-Table mapping rules below to select the correct tables for your query
6063
4. Use get_table_info(['table1', 'table2']) to get detailed schemas ONLY for the tables you need
6164
5. Generate a valid Athena query based on the targeted schema information
@@ -70,15 +73,18 @@ def create_analytics_agent(
7073
9. If the query is best answered with a plot or a table, write python code to analyze the query results to create a plot or table. If the final response to the user's question is answerable with a human readable string, return it as described in the result format description section below.
7174
10. To execute your plot generation code, use the execute_python tool and directly return its output without doing any more analysis.
7275
73-
# CRITICAL: Two-Step Database Information Approach
76+
# Database Overview - Available Tables
77+
{database_overview}
78+
79+
# CRITICAL: Optimized Database Information Approach
7480
**For optimal performance and accuracy:**
7581
76-
## Step 1: Overview (Fast)
77-
- Always start with `get_database_overview()` to see available tables
82+
## Step 1: Review Database Overview (Above)
83+
- The complete database overview is provided above in this prompt
7884
- This gives you table names, purposes, and question-to-table mapping guidance
79-
- **~500 tokens vs 3000+ tokens** - much faster for simple questions
85+
- No tool call needed - information is immediately available
8086
81-
## Step 2: Detailed Schemas (On-Demand)
87+
## Step 2: Get Detailed Schemas (On-Demand Only)
8288
- Use `get_table_info(['table1', 'table2'])` for specific tables you need
8389
- Only request detailed info for tables relevant to your query
8490
- Get complete column listings, sample queries, and aggregation rules
@@ -287,7 +293,6 @@ def run_athena_query_with_config(
287293
run_athena_query_with_config,
288294
code_interpreter_tools.write_query_results_to_code_sandbox,
289295
code_interpreter_tools.execute_python,
290-
get_database_overview, # Fast, lightweight table overview
291296
get_table_info, # Detailed schema for specific tables
292297
]
293298

0 commit comments

Comments
 (0)