Skip to content
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions api/agents/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,14 @@
from .relevancy_agent import RelevancyAgent
from .follow_up_agent import FollowUpAgent
from .response_formatter_agent import ResponseFormatterAgent
from .healer_agent import HealerAgent
from .utils import parse_response

__all__ = [
"AnalysisAgent",
"RelevancyAgent",
"FollowUpAgent",
"ResponseFormatterAgent",
"HealerAgent",
"parse_response"
]
28 changes: 25 additions & 3 deletions api/agents/analysis_agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,18 +18,29 @@ def get_analysis( # pylint: disable=too-many-arguments, too-many-positional-arg
db_description: str,
instructions: str | None = None,
memory_context: str | None = None,
database_type: str | None = None,
) -> dict:
"""Get analysis of user query against database schema."""
formatted_schema = self._format_schema(combined_tables)
# Add system message with database type if not already present
if not self.messages or self.messages[0].get("role") != "system":
self.messages.insert(0, {
"role": "system",
"content": (
f"You are a SQL expert. TARGET DATABASE: "
f"{database_type.upper() if database_type else 'UNKNOWN'}"
)
})

prompt = self._build_prompt(
user_query, formatted_schema, db_description, instructions, memory_context
user_query, formatted_schema, db_description,
instructions, memory_context, database_type
)
self.messages.append({"role": "user", "content": prompt})
completion_result = completion(
model=Config.COMPLETION_MODEL,
messages=self.messages,
temperature=0,
top_p=1,
)

response = completion_result.choices[0].message.content
Expand Down Expand Up @@ -158,7 +169,8 @@ def _format_foreign_keys(self, foreign_keys: dict) -> str:

def _build_prompt( # pylint: disable=too-many-arguments, too-many-positional-arguments
self, user_input: str, formatted_schema: str,
db_description: str, instructions, memory_context: str | None = None
db_description: str, instructions, memory_context: str | None = None,
database_type: str | None = None,
) -> str:
"""
Build the prompt for Claude to analyze the query.
Expand All @@ -169,6 +181,7 @@ def _build_prompt( # pylint: disable=too-many-arguments, too-many-positional-a
db_description: Description of the database
instructions: Custom instructions for the query
memory_context: User and database memory context from previous interactions
database_type: Target database type (sqlite, postgresql, mysql, etc.)

Returns:
The formatted prompt for Claude
Expand Down Expand Up @@ -196,13 +209,18 @@ def _build_prompt( # pylint: disable=too-many-arguments, too-many-positional-a
prompt = f"""
You must strictly follow the instructions below. Deviations will result in a penalty to your confidence score.

TARGET DATABASE: {database_type.upper() if database_type else 'UNKNOWN'}

MANDATORY RULES:
- Always explain if you cannot fully follow the instructions.
- Always reduce the confidence score if instructions cannot be fully applied.
- Never skip explaining missing information, ambiguities, or instruction issues.
- Respond ONLY in strict JSON format, without extra text.
- If the query relates to a previous question, you MUST take into account the previous question and its answer, and answer based on the context and information provided so far.
- CRITICAL: When table or column names contain special characters (especially dashes/hyphens like '-'), you MUST wrap them in double quotes for PostgreSQL (e.g., "table-name") or backticks for MySQL (e.g., `table-name`). This is NON-NEGOTIABLE.
- CRITICAL NULL HANDLING: When using calculated columns (divisions, ratios, arithmetic) with ORDER BY or LIMIT, you MUST filter out NULL values. Add "WHERE calculated_expression IS NOT NULL" or include the NULL check in your WHERE clause. NULL values sort first in ascending order and can produce incorrect results.
- CRITICAL SELECT CLAUSE: Only return columns explicitly requested in the question. If the question asks for "the highest rate" or "the lowest value", return ONLY that calculated value, not additional columns like names or IDs unless specifically asked. Use aggregate functions (MAX, MIN, AVG) when appropriate for "highest", "lowest", "average" queries instead of ORDER BY + LIMIT.
- CRITICAL VALUE MATCHING: When multiple columns could answer a question (e.g., "continuation schools"), prefer the column whose allowed values list contains an EXACT or CLOSEST string match to the question term. For example, if the question mentions "continuation schools", prefer a column with value "Continuation School" over "Continuation High Schools". Check the column descriptions for "Optional values" lists and match question terminology to those exact value strings.

If the user is asking a follow-up or continuing question, use the conversation history and previous answers to resolve references, context, or ambiguities. Always base your analysis on the cumulative context, not just the current question.

Expand Down Expand Up @@ -299,6 +317,10 @@ def _build_prompt( # pylint: disable=too-many-arguments, too-many-positional-a
12. Learn from successful query patterns in memory context and avoid failed approaches.
13. For personal queries, FIRST check memory context for user identification. If user identity is found in memory context (user name, previous personal queries, etc.), the query IS translatable.
14. CRITICAL PERSONALIZATION CHECK: If missing user identification/personalization is a significant or primary component of the query (e.g., "show my orders", "my account balance", "my recent purchases", "how many employees I have", "products I own") AND no user identification is available in memory context or schema, set "is_sql_translatable" to false. However, if memory context contains user identification (like user name or previous successful personal queries), then personal queries ARE translatable even if they are the primary component of the query.
15. CRITICAL: When generating queries with calculated columns (division, multiplication, etc.) that are used in ORDER BY or compared with LIMIT, ALWAYS add NULL filtering. For example: "WHERE (column1 / column2) IS NOT NULL" before ORDER BY. This prevents NULL values (from NULL numerators or denominators) from appearing in results.
16. SELECT CLAUSE PRECISION: Only include columns explicitly requested in the question. If a question asks "What is the highest rate?" return ONLY the rate value, not additional columns. Questions asking for "the highest/lowest/average X" should prefer aggregate functions (MAX, MIN, AVG) over ORDER BY + LIMIT, as aggregates are more concise and automatically handle what to return.
17. VALUE-BASED COLUMN SELECTION: When choosing between similar columns (e.g., "School Type" vs "Educational Option Type"), examine the "Optional values" lists in column descriptions. Prefer the column where a value EXACTLY or MOST CLOSELY matches the terminology in the question. For example, "continuation schools" should map to a column with value "Continuation School" rather than "Continuation High Schools". This string matching takes priority over column name similarity.
18. NULL HANDLING IN CALCULATIONS: When a query involves calculated expressions (like col1/col2) used with ORDER BY, filtering (WHERE), or LIMIT, ensure NULL values are explicitly filtered out. Use "AND (expression) IS NOT NULL" in the WHERE clause. This is especially important for division operations where either numerator or denominator can be NULL.

Again: OUTPUT ONLY VALID JSON. No explanations outside the JSON block. """ # pylint: disable=line-too-long
return prompt
Loading
Loading