-
Notifications
You must be signed in to change notification settings - Fork 28
Add database feedback #337
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Completed Working on "Code Review"✅ Review submitted: REQUEST_CHANGES. Total comments: 4 across files. Blocker issues present; requested changes with summary and next steps. |
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. 📝 WalkthroughWalkthroughThis pull request adds a new HealerAgent for iterative LLM-assisted SQL repair, threads database type metadata across analysis and execution flows, updates prompt construction and parsing, and integrates healing and richer streaming events into the text2sql core pipeline. Changes
Sequence DiagramsequenceDiagram
participant Client
participant Text2SQL
participant AnalysisAgent
participant Database
participant HealerAgent
participant LLM as LLM (litellm)
Client->>Text2SQL: query_database(question, db_url)
Text2SQL->>Text2SQL: get_database_type_and_loader(db_url)
Text2SQL->>AnalysisAgent: get_analysis(question, db_type)
AnalysisAgent->>LLM: generate SQL (includes TARGET DATABASE)
LLM-->>AnalysisAgent: generated_sql
AnalysisAgent-->>Text2SQL: sql_query (streamed)
Text2SQL->>Database: execute(sql_query)
alt Execution Success
Database-->>Text2SQL: results
Text2SQL->>Client: stream query_result
else Execution Failure
Database-->>Text2SQL: error_message
Text2SQL->>HealerAgent: heal_and_execute(failed_sql, error, execute_sql_func, db_description, question, database_type)
HealerAgent->>HealerAgent: validate_sql_syntax & _analyze_error
HealerAgent->>LLM: healing_prompt (JSON response requested)
LLM-->>HealerAgent: healed_sql (JSON)
HealerAgent->>Text2SQL: healed_result (or final failure)
Text2SQL->>Client: stream healing_attempt / healing_success / healing_failed
alt Healing Successful
Text2SQL->>Database: execute(healed_sql)
Database-->>Text2SQL: results
Text2SQL->>Client: stream healed query_result
else Healing Exhausted
Text2SQL->>Client: stream healing_failed (final)
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
Pre-merge checks and finishing touches❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary
- Blocker: 1
- Major: 3
- Minor: 0
Key themes
- Healing flow currently bypasses the destructive-query/schema safeguards and lacks error handling around healed execution, leaving the DB vulnerable to unsafe mutations and crashes.
- AnalysisAgent prompt mixes conflicting requirements, so the model can’t reliably decide when to translate or how to format SQL, leading to unpredictable answers.
parse_responseenforces a fixed key set and silently drops agent outputs that don’t match, so callers can’t handle malformed JSON or tailor required fields.
Next steps
- After healing a query, re-run the destructive/schema validations and wrap execution in proper error handling so unsafe SQL is never run unchecked.
- Refine the AnalysisAgent prompt to clearly separate translatable vs. non-translatable questions and spell out the SQL generation rules without contradictions.
- Make
parse_responseconfigurable so callers can specify required keys (or defaults) and receive structured errors instead of discarded payloads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.
Comments suppressed due to low confidence (3)
api/agents/healer_agent.py:1
- The dictionary construction spans multiple lines inconsistently. Consider using consistent formatting, either all on one line or each key-value pair on its own line.
"""
api/agents/healer_agent.py:289
- Trailing whitespace on empty line. Remove trailing spaces for code cleanliness.
api/agents/healer_agent.py:1 - Modifying
answer_aninside the healing exception handler couples the healing logic to the outer scope. Consider returning the healed SQL and updatinganswer_anin a single location for clarity.
"""
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.
|
@copilot can you please fix the pylint issues in the CI? |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (3)
api/agents/utils.py (1)
36-60: Fix pipeline failures: trailing whitespace and multi-block parser key validation.The pipeline is failing due to:
- Trailing whitespace on lines 39, 50, and 60
- Missing final newline on line 74
Additionally, as noted in a previous review, the multi-block parsing logic (lines 52-59) only accepts JSON blocks containing both
is_sql_translatableandsql_querykeys. However,HealerAgentresponses use different required keys (sql_query,confidence,explanation,changes_made). This means the multi-block scan will always fall back to the first block for healer responses.🔎 Apply this diff to fix trailing whitespace issues:
def parse_response(response: str) -> Dict[str, Any]: """ Parse Claude's response to extract the analysis. Handles cases where LLM returns multiple JSON blocks by extracting the last valid one. Args: response: Claude's response string Returns: Parsed analysis results """ try: # Try to find all JSON blocks (anything between { and }) # and parse the last valid one (LLM sometimes corrects itself) # Find all potential JSON blocks json_blocks = [] depth = 0 start_idx = None - + for i, char in enumerate(response): if char == '{': if depth == 0: start_idx = i depth += 1 elif char == '}': depth -= 1 if depth == 0 and start_idx is not None: json_blocks.append(response[start_idx:i+1]) start_idx = None - + # Try to parse JSON blocks from last to first (prefer the corrected version) for json_str in reversed(json_blocks): try: analysis = json.loads(json_str) # Validate it has required fields if "is_sql_translatable" in analysis and "sql_query" in analysis: return analysis except json.JSONDecodeError: continue - + # Fallback to original method if block parsing fails json_start = response.find("{") json_end = response.rfind("}") + 1 json_str = response[json_start:json_end] analysis = json.loads(json_str) return analysis except (json.JSONDecodeError, ValueError) as e: # Fallback if JSON parsing fails return { "is_sql_translatable": False, "confidence": 0, "explanation": f"Failed to parse response: {str(e)}", "error": str(response), - } + } +Consider making the required keys configurable or accepting any syntactically valid JSON block to support both
AnalysisAgentandHealerAgentoutputs.api/core/text2sql.py (1)
444-491: Healed SQL bypasses destructive operation checks.When the healer produces a fixed query (line 482), it's executed directly without re-evaluating whether it's a destructive operation. If the healer's fix transforms a non-destructive query into a destructive one (or if the original was destructive), the safeguards (confirmation prompt, schema refresh triggers) are bypassed.
Additionally, static analysis flagged:
- Line 466: Use bare
raiseinstead ofraise exec_error- Line 490: Use
logging.exceptioninstead oflogging.errorfor better traceback- Line 491: Use bare
raiseinstead ofraise healed_error🔎 Apply this diff to address static analysis issues:
if healing_result.get("healing_failed"): - raise exec_error + raise yield json.dumps({ "type": "healing_attempt", "final_response": False, "message": f"Query was automatically fixed. Changes made: {', '.join(healing_result.get('changes_made', []))}", "original_error": str(exec_error), "healed_sql": healing_result.get("sql_query", "") }) + MESSAGE_DELIMITER # Execute healed SQL try: query_results = loader_class.execute_sql_query( healing_result["sql_query"], db_url ) answer_an["sql_query"] = healing_result["sql_query"] yield json.dumps({ "type": "healing_success", "final_response": False, "message": "✅ Healed query executed successfully" }) + MESSAGE_DELIMITER except Exception as healed_error: # pylint: disable=broad-exception-caught - logging.error("Healed query also failed: %s", str(healed_error)) - raise healed_error + logging.exception("Healed query also failed: %s", str(healed_error)) + raiseConsider re-checking
is_schema_modifying_queryand destructive operation flags after healing before executing the healed SQL.api/agents/healer_agent.py (1)
90-97: Fix type hint: useAnyinstead ofany.Line 97 uses lowercase
anywhich is not a valid type hint. The correct type from thetypingmodule isAny(capitalized).🔎 Apply this diff:
+from typing import Dict, Any -from typing import Dict def heal_query( self, failed_sql: str, error_message: str, db_description: str = "", question: str = "", database_type: str = "sqlite" - ) -> Dict[str, any]: + ) -> Dict[str, Any]:
🧹 Nitpick comments (2)
api/agents/healer_agent.py (2)
66-77: Duplicate error message for unbalanced parentheses.When
paren_countgoes negative (e.g., input")xxx("), line 74 appends the error and breaks. However, sinceparen_countis still non-zero, line 77 appends the same error message again, resulting in duplicates.🔎 Apply this diff to avoid duplicate errors:
# Check for balanced parentheses paren_count = 0 + paren_error = False for char in query: if char == '(': paren_count += 1 elif char == ')': paren_count -= 1 if paren_count < 0: errors.append("Unbalanced parentheses in query") + paren_error = True break - if paren_count != 0: + if paren_count != 0 and not paren_error: errors.append("Unbalanced parentheses in query")
217-233: Consider adding MySQL-specific rules.The prompt includes specific rules for SQLite and PostgreSQL, but MySQL is mentioned as a supported database type in the docstring. Currently, MySQL will fall through without database-specific guidance.
🔎 Example diff to add MySQL rules:
elif database_type == "postgresql": prompt += """ - PostgreSQL is case-sensitive - use double quotes for mixed-case identifiers - EXTRACT() is supported: EXTRACT(YEAR FROM date_col) - Column references must match exact case when quoted """ + elif database_type == "mysql": + prompt += """ +- MySQL uses backticks ` for quoting identifiers with special characters +- MySQL is case-insensitive on most systems but case-sensitive on Linux +- Use DATE_FORMAT() for date formatting: DATE_FORMAT(date_col, '%Y') +- EXTRACT() is supported: EXTRACT(YEAR FROM date_col) +"""
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
api/agents/__init__.py(1 hunks)api/agents/analysis_agent.py(5 hunks)api/agents/healer_agent.py(1 hunks)api/agents/utils.py(3 hunks)api/core/text2sql.py(5 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
All Python code should pass pylint (use make lint)
Files:
api/agents/utils.pyapi/agents/__init__.pyapi/agents/analysis_agent.pyapi/agents/healer_agent.pyapi/core/text2sql.py
🧬 Code graph analysis (3)
api/agents/__init__.py (2)
api/agents/healer_agent.py (1)
HealerAgent(19-289)api/agents/utils.py (1)
parse_response(21-74)
api/agents/healer_agent.py (2)
api/config.py (1)
Config(62-148)api/agents/utils.py (1)
parse_response(21-74)
api/core/text2sql.py (2)
api/agents/healer_agent.py (2)
HealerAgent(19-289)heal_query(90-174)api/sql_utils/sql_sanitizer.py (4)
DatabaseSpecificQuoter(154-171)get_quote_char(158-171)SQLIdentifierQuoter(7-151)auto_quote_identifiers(107-151)
🪛 GitHub Actions: Pylint
api/agents/utils.py
[warning] 39-39: Trailing whitespace (trailing-whitespace)
[warning] 50-50: Trailing whitespace (trailing-whitespace)
[warning] 60-60: Trailing whitespace (trailing-whitespace)
[warning] 74-74: Final newline missing (missing-final-newline)
api/agents/analysis_agent.py
[warning] 29-29: Line too long (124/100) (line-too-long)
[warning] 31-31: Trailing whitespace (trailing-whitespace)
[warning] 33-33: Line too long (101/100) (line-too-long)
🪛 Ruff (0.14.8)
api/agents/healer_agent.py
167-167: Do not catch blind exception: Exception
(BLE001)
171-171: Use explicit conversion flag
Replace with conversion flag
(RUF010)
api/core/text2sql.py
466-466: Use raise without specifying exception name
Remove exception name
(TRY201)
490-490: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
491-491: Use raise without specifying exception name
Remove exception name
(TRY201)
🔇 Additional comments (4)
api/agents/__init__.py (1)
7-16: LGTM!The
HealerAgentimport and export follow the existing pattern for other agents in this package.api/agents/analysis_agent.py (1)
217-220: Good additions for SQL generation quality.The new guidelines for NULL handling, SELECT clause precision, and value-based column matching should improve the quality of generated SQL queries.
api/core/text2sql.py (1)
323-335: Good enhancement to the streaming response.The SQL query streaming payload now includes valuable metadata (
conf,miss,amb,exp,is_valid) that provides clients with richer context about the generated query.api/agents/healer_agent.py (1)
19-88: Solid implementation of the HealerAgent class.The
validate_sql_syntaxstatic method provides useful pre-validation with checks for empty queries, SQL keywords, dangerous operations, parentheses balance, and SELECT structure. The overall class design is clean and focused.
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (3)
api/agents/analysis_agent.py (1)
25-33: Fix the pylint issues flagged in previous reviews.The pipeline is still failing on this segment due to line length and trailing whitespace issues that were previously identified. Apply the fix from the previous review to resolve these pylint violations.
api/agents/healer_agent.py (1)
177-177: Fix type hint: useAnyinstead ofany.Python type hints require the capitalized
Anyfrom the typing module, not lowercaseany.🔎 Proposed fix
- ) -> Dict[str, any]: + ) -> Dict[str, Any]:api/agents/utils.py (1)
51-58: Multi-block parser may reject valid HealerAgent outputs.The parser only returns JSON blocks containing both
is_sql_translatableandsql_query(line 56). However, HealerAgent outputs containsql_query,confidence,explanation, andchanges_madebut lack theis_sql_translatablekey. This means HealerAgent responses will always fall back to the single-block parser, potentially missing corrected JSON blocks when the LLM self-corrects.Consider making the required keys configurable per agent or accepting any syntactically valid JSON block with at least
sql_query.🔎 Possible fix to make validation flexible
-def parse_response(response: str) -> Dict[str, Any]: +def parse_response(response: str, required_keys: list[str] = None) -> Dict[str, Any]: """ Parse Claude's response to extract the analysis. Handles cases where LLM returns multiple JSON blocks by extracting the last valid one. Args: response: Claude's response string + required_keys: Optional list of required keys to validate. Defaults to ["sql_query"]. Returns: Parsed analysis results """ + if required_keys is None: + required_keys = ["sql_query"] + try: # Try to find all JSON blocks (anything between { and }) # and parse the last valid one (LLM sometimes corrects itself) # Find all potential JSON blocks json_blocks = [] depth = 0 start_idx = None for i, char in enumerate(response): if char == '{': if depth == 0: start_idx = i depth += 1 elif char == '}': depth -= 1 if depth == 0 and start_idx is not None: json_blocks.append(response[start_idx:i+1]) start_idx = None # Try to parse JSON blocks from last to first (prefer the corrected version) for json_str in reversed(json_blocks): try: analysis = json.loads(json_str) # Validate it has required fields - if "is_sql_translatable" in analysis and "sql_query" in analysis: + if all(key in analysis for key in required_keys): return analysis except json.JSONDecodeError: continueThen update HealerAgent calls to:
parse_response(content, required_keys=["sql_query"])
🧹 Nitpick comments (1)
api/core/text2sql.py (1)
475-484: Consider using bareraisefor cleaner exception re-raising.When re-raising the same exception in an except block, Python best practice is to use bare
raiseinstead ofraise exec_error. This preserves the original traceback.🔎 Suggested fix
if not healing_result.get("success"): # Healing failed after all attempts yield json.dumps({ "type": "healing_failed", "final_response": False, "message": f"❌ Failed to heal query after {healing_result['attempts']} attempt(s)", "final_error": healing_result.get("final_error", str(exec_error)), "healing_log": healing_result.get("healing_log", []) }) + MESSAGE_DELIMITER - raise exec_error + raise
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
api/agents/analysis_agent.pyapi/agents/healer_agent.pyapi/agents/utils.pyapi/core/text2sql.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
All Python code should pass pylint (use make lint)
Files:
api/agents/healer_agent.pyapi/agents/analysis_agent.pyapi/core/text2sql.pyapi/agents/utils.py
🧬 Code graph analysis (1)
api/core/text2sql.py (2)
api/agents/healer_agent.py (2)
HealerAgent(19-319)heal_and_execute(169-280)api/sql_utils/sql_sanitizer.py (2)
SQLIdentifierQuoter(7-151)auto_quote_identifiers(107-151)
🪛 GitHub Actions: Pylint
api/agents/healer_agent.py
[error] 169-169: pylint: Too many local variables (R0914) in function.
[error] 169-169: pylint: Inconsistent-return-statements (R1710).
[error] 203-203: pylint: Attribute 'messages' defined outside init (W0201).
🪛 Ruff (0.14.10)
api/agents/healer_agent.py
245-245: Do not catch blind exception: Exception
(BLE001)
api/core/text2sql.py
484-484: Use raise without specifying exception name
Remove exception name
(TRY201)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: test
🔇 Additional comments (7)
api/agents/analysis_agent.py (3)
35-38: LGTM!The function call is appropriately split across lines to comply with line length constraints.
170-174: LGTM!The signature correctly threads the database_type parameter through to enable type-aware prompt construction.
221-223: Excellent additions for query quality.These critical rules address important edge cases:
- NULL filtering prevents incorrect results in calculated columns with ORDER BY/LIMIT
- SELECT clause precision ensures only requested columns are returned
- Value-based column selection improves disambiguation in complex schemas
These guidelines will significantly improve SQL generation accuracy.
Also applies to: 320-323
api/core/text2sql.py (1)
257-257: LGTM!The database type propagation and early SQL query emission are well-integrated:
- Line 257 correctly extracts db_type from the tuple
- Lines 314-315 thread db_type to the analysis agent for type-aware SQL generation
- Lines 324-335 emit SQL query early with comprehensive metadata, enabling better streaming UX
Also applies to: 314-315, 324-335
api/agents/healer_agent.py (3)
31-90: LGTM!The SQL syntax validation provides good basic checks for empty queries, keyword presence, parentheses balance, and SELECT structure. The dangerous operation detection (line 61-66) appropriately returns warnings rather than errors, allowing users to be informed without blocking execution.
92-167: Excellent database-specific healing guidance.The healing prompt and error analysis provide targeted, actionable guidance:
- Database-specific syntax rules (SQLite EXTRACT→strftime, PostgreSQL case sensitivity)
- Clear instructions to fix only what's broken
- Structured JSON response format
- Error pattern matching with helpful hints
This focused approach should significantly improve healing success rates.
Also applies to: 283-318
203-280: Healing loop logic is well-designed.The iterative healing approach is sound:
- Validates syntax and enhances error context
- Builds focused healing prompt once
- Loops: LLM call → parse → execute → evaluate
- Provides feedback on failures for next iteration
The broad exception catching at line 245 is appropriate here since
execute_sql_funcis a user-provided callable that could raise various database-specific exceptions, and we need to capture the error message for the next healing iteration.
Co-authored-by: Copilot <[email protected]>
|
@galshubeli I've opened a new pull request, #358, to work on those changes. Once the pull request is ready, I'll request review from you. |
|
@galshubeli I've opened a new pull request, #359, to work on those changes. Once the pull request is ready, I'll request review from you. |
Summary by CodeRabbit
New Features
Improvements
✏️ Tip: You can customize this high-level summary in your review settings.