System Architecture — Deep Dive

Comprehensive technical documentation of CheckMyData.ai internals. For a quick overview see ARCHITECTURE.md.

System Overview and Component Map
Agent Orchestrator
Memory System
LLM Integration and OpenRouter
Self-Learning and Feedback Loops
User Interaction Flow
Knowledge Layer (RAG)
Data Flow Diagrams

1. System Overview and Component Map

CheckMyData.ai is an AI-powered database query agent. Users connect databases (PostgreSQL, MySQL, ClickHouse, MongoDB), link Git repositories, and ask questions in natural language. The system translates questions into SQL, executes them, validates the results, and renders rich visualizations — all while learning from every interaction.

Technology Stack

Layer	Technology
Frontend	Next.js 15, React 19, TypeScript, Tailwind CSS 4, Zustand
Backend	Python 3.12, FastAPI, SQLAlchemy 2.0 (async), Alembic
LLM Providers	OpenAI, Anthropic, OpenRouter (any model)
Vector Store	ChromaDB
App Database	SQLite (dev) / PostgreSQL (prod)
User Databases	PostgreSQL, MySQL, ClickHouse, MongoDB
Deployment	Docker, Heroku, DigitalOcean App Platform

Module Decomposition

backend/app/
├── agents/                 # Multi-agent orchestration system
│   ├── orchestrator.py     # Main coordinator (OrchestratorAgent)
│   ├── tool_dispatcher.py  # Meta-tool dispatch to sub-agents
│   ├── response_builder.py # Response assembly and synthesis
│   ├── context_loader.py   # Lazy context loading (tables, KB, learnings)
│   ├── adaptive_planner.py # Quick (heuristic) + full (LLM) plan generation
│   ├── data_gate.py        # Intermediate data-quality checks between stages
│   ├── pipeline_learning.py# Pipeline-level learning extraction
│   ├── sql_agent.py        # SQL generation, validation, execution
│   ├── knowledge_agent.py  # RAG-powered codebase search
│   ├── viz_agent.py        # Visualization type selection
│   ├── mcp_source_agent.py # External MCP data source queries
│   ├── query_planner.py    # Multi-stage plan decomposition (legacy)
│   ├── stage_executor.py   # Pipeline stage runner with DataGate + replan
│   ├── stage_validator.py  # Per-stage result validation
│   ├── stage_context.py    # Pipeline state (ExecutionPlan, StageResult)
│   ├── base.py             # AgentContext, AgentResult, BaseAgent
│   ├── validation.py       # Inter-agent result validation
│   ├── errors.py           # AgentError hierarchy
│   ├── prompts/            # System prompt builders
│   └── tools/              # Meta-tool definitions
├── llm/                    # LLM provider abstraction
│   ├── router.py           # LLMRouter with fallback + retry
│   ├── base.py             # Message, Tool, ToolCall, LLMResponse
│   ├── errors.py           # Unified LLM error hierarchy
│   ├── openai_adapter.py   # OpenAI SDK adapter
│   ├── anthropic_adapter.py# Anthropic SDK adapter
│   └── openrouter_adapter.py # OpenRouter HTTP adapter
├── core/                   # Cross-cutting concerns
│   ├── agent.py            # ConversationalAgent wrapper
│   ├── context_budget.py   # Token budget allocator
│   ├── history_trimmer.py  # Chat history trimming + summarization
│   ├── insight_memory.py   # Persistent insight store
│   ├── validation_loop.py  # SQL pre/execute/post/repair cycle
│   ├── workflow_tracker.py # SSE event broadcasting
│   ├── context_enricher.py # Failed-query context repair
│   ├── health_monitor.py   # System health checks
│   └── ...
├── connectors/             # Database connectivity
│   ├── postgres.py, mysql.py, mongodb.py, clickhouse.py
│   ├── ssh_tunnel.py       # SSH tunnel for remote DBs
│   └── mcp_client.py       # MCP protocol client
├── knowledge/              # RAG pipeline and analysis
│   ├── vector_store.py     # ChromaDB wrapper
│   ├── repo_analyzer.py    # Git repo indexing
│   ├── learning_analyzer.py# Heuristic + LLM lesson extraction
│   ├── entity_extractor.py # Code entity extraction
│   └── custom_rules.py     # Project-specific rules engine
├── services/               # Business logic
│   ├── chat_service.py     # Session + message CRUD
│   ├── agent_learning_service.py # Learning lifecycle
│   ├── session_notes_service.py  # Working memory
│   ├── feedback_pipeline.py      # Feedback processing
│   ├── rag_feedback_service.py   # RAG quality tracking
│   ├── suggestion_engine.py      # Follow-up suggestions
│   └── ...
├── models/                 # SQLAlchemy ORM models
│   ├── chat_session.py     # ChatSession, ChatMessage
│   ├── agent_learning.py   # AgentLearning, AgentLearningSummary
│   ├── session_note.py     # SessionNote (working memory)
│   ├── insight_record.py   # InsightRecord, TrustScore
│   ├── pipeline_run.py     # PipelineRun (multi-stage state)
│   ├── data_validation.py  # DataValidationFeedback
│   ├── rag_feedback.py     # RAGFeedback
│   └── ...
├── api/routes/             # REST API endpoints
│   ├── chat.py             # Chat, streaming, feedback
│   ├── models.py           # LLM model listing
│   ├── connections.py      # Database connections
│   ├── projects.py         # Project CRUD + LLM settings
│   └── ...
├── mcp_server/             # MCP server for external clients
├── pipelines/              # Long-running workflows
└── main.py                 # FastAPI app entry point

frontend/src/
├── app/                    # Next.js App Router pages
├── components/
│   ├── chat/               # ChatPanel, ChatMessage, ChatInput, streaming
│   ├── viz/                # DataTable, charts, visualizations
│   ├── connections/        # Database connection management
│   ├── learnings/          # LearningsPanel
│   ├── insights/           # InsightsPanel
│   ├── analytics/          # FeedbackAnalyticsPanel
│   └── ui/                 # LlmModelSelector, modals, buttons
├── stores/                 # Zustand state management
│   ├── app-store.ts        # Active session, project
│   ├── auth-store.ts       # Authentication
│   └── ...
├── hooks/                  # React hooks (useRestoreState, etc.)
└── lib/                    # API client, SSE, utilities

2. Agent Orchestrator

The orchestrator is the central brain of the system. It receives a natural language question, decides what to do, delegates to specialist sub-agents, validates results, and composes the final answer.

2.1 Architecture Overview

flowchart TB
    User["User Question"]
    CA["ConversationalAgent\n(wrapper)"]
    OA["OrchestratorAgent"]
    CC{Complexity\nCheck}
    TL["Tool-Calling Loop\n(simple path)"]
    MP["Multi-Stage Pipeline\n(complex path)"]

    SQL["SQLAgent"]
    KB["KnowledgeAgent"]
    VIZ["VizAgent"]
    MCP["MCPSourceAgent"]

    User --> CA --> OA --> CC
    CC -->|simple| TL
    CC -->|complex| MP

    TL --> SQL
    TL --> KB
    TL --> MCP
    TL --> VIZ

    MP --> SQL
    MP --> KB

2.2 Entry Point

The chat route (backend/app/api/routes/chat.py) holds a singleton ConversationalAgent instance. This is a thin backward-compatible wrapper defined in backend/app/core/agent.py that delegates everything to OrchestratorAgent:

class ConversationalAgent:
    def __init__(self, ...):
        self._orchestrator = OrchestratorAgent(...)

    async def run(self, question, project_id, ...):
        context = AgentContext(...)
        return await self._orchestrator.run(context)

2.3 AgentContext — The Shared State Object

Every sub-agent receives the same AgentContext (defined in backend/app/agents/base.py):

Field	Type	Purpose
`project_id`	`str`	Workspace identifier
`connection_config`	`ConnectionConfig \| None`	Database credentials + type
`user_question`	`str`	The natural language question
`chat_history`	`list[Message]`	Recent conversation messages
`llm_router`	`LLMRouter`	LLM provider with retry/fallback
`tracker`	`WorkflowTracker`	SSE event emitter
`workflow_id`	`str`	Unique ID for this request
`user_id`	`str \| None`	Authenticated user
`preferred_provider`	`str \| None`	e.g. "openrouter"
`model`	`str \| None`	e.g. "openai/gpt-4o"
`sql_provider` / `sql_model`	`str \| None`	Separate model for SQL generation
`project_name`	`str \| None`	Human-readable project name
`extra`	`dict`	Pipeline action, session ID, flags

2.4 Intent Classification

Before loading any heavy context (table maps, learnings, staleness checks), the orchestrator runs a lightweight LLM-based intent classification step (~500 tokens, ~0.5s). This determines the minimal execution chain needed for the request.

Classification intents:

Intent	Description	Context Loaded	Tools Exposed
`direct_response`	Greetings, meta-questions, casual conversation	None	None
`data_query`	Questions requiring database queries	Table map, learnings, overview	`query_database`, `process_data`, `manage_rules`, `ask_user`
`knowledge_query`	Questions about code, architecture, docs	KB staleness	`search_codebase`, `ask_user`
`mcp_query`	Questions requiring external MCP sources	MCP sources	`query_mcp_source`, `ask_user`
`mixed`	Ambiguous or multi-capability queries	Everything	All available

Key behaviours:

The classification prompt is dynamic: only intents whose capabilities are available (e.g., DB connection exists) are presented to the LLM.
On any classification failure, falls back to mixed (full pipeline) — no request is ever dropped.
direct_response path uses a single LLM call with a minimal prompt (~100 tokens), zero DB queries, and no tools. For a greeting like "hello", this reduces the pipeline from ~174K tokens / 15s to ~2K tokens / 1-2s.
data_query path loads only DB-relevant context and runs complexity detection for the multi-stage pipeline.
knowledge_query path loads only KB staleness and exposes only search_codebase.

Implementation: backend/app/agents/intent_classifier.py (classify_intent()), prompt builders in backend/app/agents/prompts/orchestrator_prompt.py (build_classification_prompt(), build_direct_response_prompt()).

2.5 Simple Query Flow (Tool-Calling Loop)

For straightforward questions, the orchestrator uses an iterative tool-calling loop:

Step 1 — Context Loading (parallel)

The orchestrator loads three context sources simultaneously using asyncio.gather:

Staleness check: how far behind the knowledge base is from Git HEAD
MCP source check: whether the project has MCP-type connections
Knowledge base check: whether ChromaDB has indexed documents

Step 2 — History Trimming

Chat history is trimmed to fit within max_history_tokens using HistoryTrimmer:

Large tool results are condensed to 500 chars
Older messages are summarized via an LLM call (or dropped if no LLM available)
A summary message replaces the collapsed messages

Step 3 — Context Budget Allocation

ContextBudgetManager allocates token budget by priority:

Priority	Component	Budget Share
1 (highest)	System prompt	Fixed (never truncated)
2	Chat history	30% of total
3	Schema / table map	25% of total
4	Custom rules	10% of total
5	Agent learnings	8% of total
6 (lowest)	Project overview	Remainder

Each component is truncated to fit its budget if necessary.

Step 4 — System Prompt Construction

build_orchestrator_system_prompt() (backend/app/agents/prompts/orchestrator_prompt.py) assembles the system prompt dynamically based on available capabilities:

Available tools section (conditional on has_connection, has_knowledge_base, has_mcp_sources)
Database table map for routing context
Project knowledge overview
Recent agent learnings (verified patterns from past interactions)
Data verification protocol guidelines
Complex multi-step query instructions

Step 5 — Tool-Calling Loop (Adaptive Step Budget)

The loop runs up to max_orchestrator_iterations (default 100, configurable in settings and overrideable per-project or per-request via max_steps):

for each iteration:
    1. Trim loop messages if approaching context limit
    2. Inject "wrap up" message if near 70% context capacity
    3. Inject step-budget wrap-up if ≤ orchestrator_wrap_up_steps remain
    4. Call LLM with messages + tools
    5. If no tool calls → final text answer, break
    6. If tool calls:
       a. Dispatch to sub-agents (parallel if multiple, sequential for process_data)
       b. Append assistant message + tool results to message list
       c. Continue loop
else (loop exhausted):
    7. Final LLM synthesis call (no tools, summarize collected data)
    8. Return step_limit_reached response with continuation_context

Parallel tool execution: When the LLM requests multiple tools (except process_data), they are dispatched concurrently via asyncio.gather. Failures in parallel calls are caught individually and reported as error strings.

Context pressure management: As the loop progresses and context fills up:

At 80% capacity: older assistant+tool pairs are collapsed into summaries
At 70% capacity: a system message instructs the LLM to stop making tool calls and compose a final answer
On token limit error: context is aggressively compressed to 60% and retried

Step-budget awareness: When orchestrator_wrap_up_steps (default 3) iterations remain, the LLM receives a system message urging it to compose a final answer. This works alongside context-pressure wrap-up.

Final synthesis on exhaustion: When orchestrator_final_synthesis is enabled (default), the orchestrator makes one final LLM call without tools to synthesize all collected data into a coherent answer — instead of returning a static "maximum steps reached" message.

Continuation protocol: When the step limit is reached, the response includes response_type: "step_limit_reached", steps_used, steps_total, and a continuation_context payload. The frontend renders a "Continue analysis" button that resends the request with pipeline_action: "continue_analysis", allowing the user to resume the analysis.

Step 6 — Visualization Selection

If the response includes SQL results, the VizAgent selects the best chart type. The AgentResultValidator then checks the visualization (e.g., pie chart with too many slices falls back to bar chart).

Step 7 — Follow-up Suggestions

SuggestionEngine.generate_followups() produces 2-3 suggested follow-up questions based on the query, columns, and row count.

2.6 Complex Query Flow (Multi-Stage Pipeline)

For questions requiring multiple data retrieval steps, the orchestrator switches to a multi-stage pipeline.

Complexity detection uses two levels:

Heuristic check (detect_complexity()): scans for keywords like "summary table", "pivot", "cross-reference", "compare", "for each", "step 1", etc.
Adaptive LLM check (detect_complexity_adaptive()): a lightweight LLM call that returns "simple" or "complex" — used only if the heuristic is inconclusive.

Pipeline execution:

flowchart LR
    Q["Complex Question"]
    QP["QueryPlanner\n(LLM call)"]
    EP["ExecutionPlan\n(N stages)"]
    SE["StageExecutor"]
    SV["StageValidator"]
    CP{Checkpoint?}
    USR["User Review"]
    FR["Final Response"]

    Q --> QP --> EP --> SE
    SE --> SV
    SV -->|passed| CP
    SV -->|failed| SE
    CP -->|yes| USR --> SE
    CP -->|no| SE
    SE -->|all done| FR

Components:

Component	File	Role
`QueryPlanner`	`backend/app/agents/query_planner.py`	Decomposes question into `ExecutionPlan` via a single LLM call
`ExecutionPlan`	`backend/app/agents/stage_context.py`	Ordered list of `PlanStage` objects with descriptions, SQL hints, validation criteria
`StageExecutor`	`backend/app/agents/stage_executor.py`	Runs stages sequentially: SQL agent call → validation → checkpoint → next
`StageValidator`	`backend/app/agents/stage_validator.py`	Checks data shape, row-count bounds, cross-stage consistency
`StageContext`	`backend/app/agents/stage_context.py`	In-memory pipeline state: plan, results per stage, user feedback
`PipelineRun`	`backend/app/models/pipeline_run.py`	DB-persisted pipeline state for resume/retry

Checkpoint mechanism: Stages can be flagged as checkpoints. When a checkpoint stage completes, the orchestrator pauses and presents intermediate results to the user with options to continue, modify, or retry. The user's response is processed via _resume_pipeline() which reconstitutes StageContext from the DB and continues execution.

Fallback: If the planner fails to produce a valid plan, the orchestrator falls back to the simple tool-calling loop with a _skip_complexity flag to prevent infinite recursion.

2.7 Meta-Tools

The orchestrator's LLM sees these meta-tools (defined in backend/app/agents/tools/orchestrator_tools.py):

Tool	When Available	Sub-Agent	Description
`query_database`	DB connected	`SQLAgent`	Generates, validates, and executes SQL
`search_codebase`	KB indexed	`KnowledgeAgent`	RAG search over indexed code/docs
`manage_rules`	DB connected	Direct (RuleService)	CRUD for project rules
`query_mcp_source`	MCP connections	`MCPSourceAgent`	Query external MCP data sources
`process_data`	DB connected	`DataProcessor`	In-memory data enrichment (IP→country, phone→country, aggregate, filter)
`ask_user`	Always available	None (raises exception)	Structured clarification question

Tools are assembled dynamically by get_orchestrator_tools() based on what the project has configured.

2.8 Sub-Agent Retry Logic

Each sub-agent call is wrapped in retry logic with MAX_SUB_AGENT_RETRIES = 2:

for attempt in range(MAX_SUB_AGENT_RETRIES + 1):
    try:
        result = await sub_agent.run(context, ...)
        validation = validator.validate(result)
        if validation.passed or attempt == MAX_SUB_AGENT_RETRIES:
            return result
    except AgentRetryableError:
        continue  # retry
    except AgentFatalError:
        return error  # no retry

The AgentResultValidator (backend/app/agents/validation.py) checks:

SQL results: query present, no execution error, warns on zero rows or slow queries (>30s)
Viz results: valid viz_type, appropriate chart for data shape
Knowledge results: non-empty answer, source citations present

2.9 Error Handling

The error hierarchy (backend/app/agents/errors.py):

AgentError (base)
├── AgentTimeoutError      → retry with adjusted context
├── AgentRetryableError    → orchestrator may retry
├── AgentFatalError        → unrecoverable (missing connection, auth failure)
└── AgentValidationError   → sub-agent result failed checks

The orchestrator catches different error types and produces appropriate user-facing messages:

Connection errors → "Please check your connection settings"
Permission errors → "Please check your database credentials"
LLM errors → provider-specific user messages from the error hierarchy
Unknown errors → generic "An unexpected error occurred"

2.10 Clarification Requests

The orchestrator proactively assesses request ambiguity via the REQUEST ANALYSIS PROTOCOL in its system prompt, using ask_user before executing tools when the intent is unclear, data coverage is uncertain, or assumptions would be required.

When the orchestrator's LLM calls ask_user, it raises _ClarificationRequestError which interrupts the tool loop and returns an AgentResponse with response_type="clarification_request" and a dedicated clarification_data dict containing:

question: what to ask the user
question_type: yes_no, multiple_choice, numeric_range, or free_text
options: for multiple choice questions
context: explanation of why the question is being asked

The clarification_data is passed through all three response paths (REST /ask, SSE /ask/stream, WebSocket) and persisted in message metadata so it survives session reload. The frontend renders a ClarificationCard component with type-appropriate UI controls (buttons for yes/no, selectable options for multiple choice, text input for free text). The user's answer is sent as the next chat message, which the orchestrator uses to proceed with the original task.

The ask_user tool is always available regardless of whether a database or knowledge base is connected.

3. Memory System

The system implements a four-layer memory architecture, each with different scope, persistence, and decay characteristics.

flowchart TB
    subgraph shortTerm ["Layer 1: Chat History (per-session, minutes)"]
        CM["ChatMessage rows"]
        HT["HistoryTrimmer"]
        SS["SessionSummarizer"]
    end
    subgraph workingMem ["Layer 2: Session Notes (per-connection, days)"]
        SN["SessionNote"]
        SNS["SessionNotesService"]
    end
    subgraph longTerm ["Layer 3: Agent Learnings (per-connection, months)"]
        AL["AgentLearning"]
        ALS["AgentLearningService"]
        LA["LearningAnalyzer"]
    end
    subgraph persistent ["Layer 4: Insight Memory (per-project, permanent)"]
        IR["InsightRecord"]
        TS["TrustScore"]
        IMS["InsightMemoryService"]
    end

    CM --> HT
    HT -->|"summary"| CM
    CM -->|"feeds"| SN
    SN -->|"pattern"| AL
    AL -->|"aggregate"| IR

3.1 Layer 1: Chat History (Short-Term Memory)

Scope: Per chat session. Lifetime: Duration of conversation.

Models (backend/app/models/chat_session.py):

ChatSession: project_id, title, user_id, connection_id, created_at
ChatMessage: session_id, role (user/assistant/tool), content, metadata_json, tool_calls_json, user_rating

ChatService (backend/app/services/chat_service.py):

create_session(), add_message(), get_history_as_messages()
get_history_as_messages() enriches assistant messages with SQL context: appends query text, viz type, row count, column names, and sample data to the message content so the LLM has full context for follow-up questions.

HistoryTrimmer (backend/app/core/history_trimmer.py):

Manages two levels of trimming:

Session-level trimming (trim_history()): Before the orchestrator loop starts
- Condenses tool results longer than 500 chars to 8-line previews
- Walks backwards from newest message, accumulating tokens
- When budget is exceeded, summarizes older messages via an LLM call into a single system message: "[Conversation summary of N earlier messages] ..."
- If no LLM router available, uses fallback: "Previous topics discussed: ..."
In-loop trimming (trim_loop_messages()): During the tool-calling loop
- Triggers at 80% of context budget
- Preserves system prompt and last user message
- Collapses intermediate assistant+tool pairs into one-liner summaries
- Injects a "wrap up" instruction at 70% capacity to tell the LLM to stop making tool calls

Token estimation: Uses tiktoken when available (accurate for OpenAI models), falls back to character-based heuristic at ~4 chars per token.

3.2 Layer 2: Session Notes (Working Memory)

Scope: Per database connection, cross-session. Lifetime: Days to months (confidence decay).

Session notes are the agent's "working memory" — observations, conventions, and user preferences that persist across chat sessions for the same database connection.

Model (backend/app/models/session_note.py):

Field	Description
`connection_id`	Links to a specific database connection
`project_id`	Project scope
`category`	One of: `data_observation`, `column_mapping`, `business_logic`, `calculation_note`, `user_preference`, `verified_benchmark`
`subject`	Table or metric name
`note`	Free-text observation
`note_hash`	SHA-256 for exact dedup
`confidence`	0.0 to 1.0
`is_verified`	User-confirmed flag
`is_active`	Soft-delete flag

SessionNotesService (backend/app/services/session_notes_service.py):

Fuzzy deduplication: Before creating a note, checks existing notes with the same connection/category/subject. Uses SequenceMatcher with threshold 0.75 — if a similar note exists, its confidence is bumped (+0.1) and the longer text is kept.

Prompt compilation (compile_notes_prompt()): Formats notes into a structured prompt block organized by category:

AGENT NOTES (observations from previous sessions with this database):
### Business Logic
- [orders] Cancelled orders are excluded from revenue. (85% [VERIFIED])
### Data Observations
- [payments] Amounts stored in cents, divide by 100. (70%)

Confidence decay: Unverified notes inactive for 60+ days lose 0.1 confidence per decay cycle. Notes below 0.1 are not included in prompts.
Notes are injected into SQL agent prompts for context about the specific database.

3.3 Layer 3: Agent Learnings (Long-Term Memory)

Scope: Per database connection, cross-session. Lifetime: Months (with decay).

Agent learnings are structured lessons extracted from query validation outcomes and user feedback. They represent patterns the system has discovered about a specific database.

Models (backend/app/models/agent_learning.py):

AgentLearning: connection_id, category, subject, lesson, lesson_hash, confidence, source_query, source_error, times_confirmed, times_applied, is_active
AgentLearningSummary: connection_id, total_lessons, lessons_by_category_json, compiled_prompt, last_compiled_at

Categories (9 total):

Category	Example
`table_preference`	"Use `transactions` instead of `legacy_payments` for queries of this type"
`column_usage`	"Column `user_name` does not exist on `users`. Use `full_name` instead"
`data_format`	"Column `amount` on `orders` stores values in cents. Divide by 100 for dollars"
`query_pattern`	"User flagged incorrect results for query on orders. Detail: missing date filter"
`schema_gotcha`	"Table `users` uses soft-delete. Always filter with `WHERE deleted_at IS NULL`"
`performance_hint`	"Table `events` can timeout on unfiltered queries. Always add LIMIT and date filter"
`pipeline_pattern`	"This query type triggered 2 replans. Consider simplifying the plan"
`data_quality_hint`	"DataGate error on stage 'fetch_users': cartesian join detected"
`replan_recovery`	"Stage 'agg' (query_database) failed. Replan succeeded by splitting into sub-queries"

AgentLearningService (backend/app/services/agent_learning_service.py):

Quality gates (applied before storage):
- Subject blocklist: SQL keywords (columns, tables, information_schema, unknown, schema, etc.) are rejected
- Minimum lesson length: 15 characters
- Non-ASCII ratio check: lessons with >50% non-ASCII characters are rejected (prevents raw non-English user questions from being stored)
- Lesson normalization: whitespace collapse, capitalization, max 500 chars with ellipsis truncation
Three-level deduplication:
1. Exact hash match (same lesson_hash + connection + category + subject)
2. Fuzzy match (SequenceMatcher threshold 0.75)
3. If duplicate found: bump confidence +0.1, keep longer lesson text
Conflict resolution: When creating a new learning, checks existing ones for contradictions. Detects negation flips ("use" vs "not use", "always" vs "never", "avoid" vs "prefer"). If the new lesson has higher confidence, the old conflicting one is deactivated.
Schema cross-validation: validate_learnings_against_schema() deactivates learnings whose subject no longer matches any known table in the current schema. It runs automatically during POST /connections/{id}/refresh-schema and is also available as a manual endpoint at POST /connections/{id}/learnings/validate-schema. Read-time blocklist filtering in get_learnings() additionally excludes legacy bad subjects (columns, tables, information_schema, etc.) from all prompts and API responses.

Confidence lifecycle:

Create: 0.6 (default)
  ↓ confirmed by user or repeated: +0.1 (cap at 1.0)
  ↓ applied in query: tracked (times_applied counter)
  ↓ contradicted: -0.3
  ↓ stale (30+ days, never applied): -0.05/month
  ↓ stale (30+ days, previously applied): -0.02/month
  ↓ below 0.2: deactivated

User voting: REST endpoints for confirm (upvote) and contradict (downvote) allow users to directly influence learning confidence via the LearningsPanel UI. Both operations invalidate the compiled summary cache immediately so the next agent prompt reflects the updated confidence.
Prompt compilation (compile_prompt()): Priority-ranked by composite score (confidence * 0.4 + log(confirmations) * 0.4 + log(applications) * 0.2). Top 30 learnings organized by category, with confidence percentages and critical flags (5+ confirmations).
Cross-connection transfer: Learnings from sibling connections in the same project are included if they are in transferable categories (schema_gotcha, performance_hint) and have confidence >= 0.6.
Global pattern promotion: Learnings that appear independently on 2+ different connections are identified as "global patterns" and promoted to connections that don't have them yet. This allows knowledge to spread across databases.
How learnings are used: The orchestrator loads the top 15 high-confidence learnings via _load_recent_learnings() and injects them into the system prompt under "RECENT AGENT LEARNINGS (verified insights)".

3.4 Layer 4: Insight Memory (Persistent Findings)

Scope: Per project, cross-connection. Lifetime: Permanent until resolved/dismissed.

Insights are higher-level findings discovered during data analysis — anomalies, trends, opportunities, data quality issues.

Models (backend/app/models/insight_record.py):

InsightRecord: project_id, connection_id, insight_type, severity, title, description, evidence_json, source_metrics_json, source_query, recommended_action, expected_impact, confidence, status, user_verdict, user_feedback, times_surfaced, times_confirmed, times_dismissed
TrustScore: insight_id, confidence, data_freshness_hours, sources_json, validation_method, sample_size

Insight types: anomaly, opportunity, loss, trend, pattern, reconciliation_mismatch, data_quality, observation

Severity levels: critical, warning, info, positive

Status lifecycle:

active → confirmed (user verified)
       → dismissed (user rejected)
       → resolved (user marked as fixed)
       → expired (confidence decayed below 0.15)
       → superseded (newer insight replaced it)

InsightMemoryService (backend/app/core/insight_memory.py):

Deduplication: Before storing, checks existing active insights with similarity threshold 0.80 (SequenceMatcher on titles). If duplicate found: bumps times_surfaced and slightly increases confidence (+0.05).
Trust scoring: Each insight has an associated TrustScore tracking data freshness, source list, validation method, and sample size.
Confidence decay: Unconfirmed insights older than 30 days lose 0.05 confidence per decay cycle. Below 0.15 they are expired.
User feedback: Users can confirm, dismiss, or resolve insights through the UI.

3.5 Context Budget Management

ContextBudgetManager (backend/app/core/context_budget.py) ensures all context fits within the model's context window.

Allocation algorithm:

total_budget = min(settings.max_context_tokens, model_context_window)

1. System prompt: exact fit (never truncated)
2. Chat history: min(remaining, 30% of total)
3. Schema / table map: min(remaining, 25% of total) → truncated if needed
4. Custom rules: min(remaining, 10% of total) → truncated if needed
5. Agent learnings: min(remaining, 8% of total) → truncated if needed
6. Project overview: whatever remains → truncated if needed

Truncation appends "... (truncated to fit context budget)" so the LLM knows information was cut.

4. LLM Integration and OpenRouter

4.1 Provider Abstraction

All LLM providers implement the BaseLLMProvider interface (backend/app/llm/base.py):

class BaseLLMProvider(ABC):
    async def complete(messages, tools, model, temperature, max_tokens) -> LLMResponse
    def stream(messages, tools, model, temperature, max_tokens) -> AsyncIterator[str]
    def provider_name -> str

Core data types:

Message: role (system/user/assistant/tool), content, optional tool_call_id, name, tool_calls
Tool: name, description, list of ToolParameter (name, type, description, required, enum)
ToolCall: id, name, arguments dict
LLMResponse: content, tool_calls, usage dict, model, provider, finish_reason

4.2 Provider Adapters

Adapter	File	SDK / Transport	Default Model	Timeout
`OpenAIAdapter`	`backend/app/llm/openai_adapter.py`	OpenAI Python SDK	`gpt-4o`	90s
`AnthropicAdapter`	`backend/app/llm/anthropic_adapter.py`	Anthropic Python SDK	`claude-sonnet-4-20250514`	90s
`OpenRouterAdapter`	`backend/app/llm/openrouter_adapter.py`	httpx (raw HTTP)	`openai/gpt-4o`	120s

The OpenRouter adapter communicates directly via HTTP to https://openrouter.ai/api/v1:

Sets HTTP-Referer: https://checkmydata.ai and X-Title: CheckMyData.ai headers (required by OpenRouter)
Supports both complete() (single response) and stream() (SSE streaming)
Maps httpx errors to the unified error hierarchy: 429 → LLMRateLimitError, 401/403 → LLMAuthError, 400 with "context length" → LLMTokenLimitError, 5xx → LLMServerError, timeout → LLMTimeoutError

4.3 Unified Error Hierarchy

All provider-specific exceptions are caught in adapters and re-raised as unified types (backend/app/llm/errors.py):

LLMError (base)
├── LLMRateLimitError       retryable=True   retry_after=5s   (429)
├── LLMServerError          retryable=True   retry_after=2s   (5xx)
├── LLMTimeoutError         retryable=True   retry_after=2s
├── LLMConnectionError      retryable=True   retry_after=2s
├── LLMAuthError            retryable=False                   (401/403)
├── LLMTokenLimitError      retryable=False                   (context overflow)
├── LLMContentFilterError   retryable=False                   (policy violation)
└── LLMAllProvidersFailedError  retryable=True  retry_after=3s

Each error type has a user_message property with a human-friendly explanation.

4.4 LLMRouter — Retry + Fallback

LLMRouter (backend/app/llm/router.py) is the single entry point for all LLM calls. It provides:

Provider registry:

PROVIDER_REGISTRY = {
    "openai": OpenAIAdapter,
    "anthropic": AnthropicAdapter,
    "openrouter": OpenRouterAdapter,
}

Providers are lazily instantiated on first use.

Fallback chain construction (_get_fallback_chain()):

Start with preferred_provider (request-level) or settings.default_llm_provider (app-level)
Append remaining providers in order: openai → anthropic → openrouter
Filter out providers with no API key configured
Filter out providers marked unhealthy (quarantined for 120 seconds)
If all filtered out (all unhealthy), fall back to all providers with keys

Per-provider retry (_call_with_retry()):

Up to 3 attempts per provider
Exponential backoff: 2s base, 2x multiplier
Non-retryable errors (LLMAuthError, LLMTokenLimitError) fail immediately
Retryable errors use retry_after from the error or the backoff delay

Cross-provider fallback (complete()):

for each provider in fallback_chain:
    try:
        return _call_with_retry(provider, messages, tools, model, ...)
    except LLMAuthError:
        break  # don't try other providers for auth errors
    except LLMTokenLimitError:
        continue  # try next provider (might have bigger context)
    except retryable error:
        mark_unhealthy(provider)
        continue  # try next provider

raise LLMAllProvidersFailedError

Streaming fallback (stream()):

Same fallback chain logic
Key constraint: if tokens have already been yielded to the client, the stream cannot be safely retried on another provider (partial response already sent)

Health check loop:

Background task pings unhealthy providers every 60 seconds with a minimal "hi" message
If a provider responds successfully, it's marked healthy and re-enters the fallback chain
Started during app lifespan, stopped on shutdown

Context window awareness:

MODEL_CONTEXT_WINDOWS = {
    "gpt-4o": 128_000,
    "gpt-4o-mini": 128_000,
    "gpt-4-turbo": 128_000,
    "gpt-4": 8_192,
    "gpt-3.5-turbo": 16_385,
    "claude-sonnet-4-20250514": 200_000,
    "claude-3-5-sonnet-20241022": 200_000,
    "claude-3-haiku-20240307": 200_000,
    "claude-3-opus-20240229": 200_000,
}

get_context_window(model) returns the known size or 16,000 as safe default. Also tries partial matching (e.g., "gpt-4o" matches "gpt-4o-mini").

4.5 Model Selection

Model selection happens at three levels:

1. Project-level defaults (backend/app/models/project.py):

Each project stores per-purpose LLM preferences:

agent_llm_provider / agent_llm_model — for the orchestrator
sql_llm_provider / sql_llm_model — for SQL generation
indexing_llm_provider / indexing_llm_model — for repo indexing

2. Request-level override:

The chat request body can include preferred_provider and model fields that override project defaults. The resolution logic in the chat route:

agent_provider = body.preferred_provider or project.agent_llm_provider
agent_model = body.model or project.agent_llm_model
sql_provider = project.sql_llm_provider or agent_provider
sql_model = project.sql_llm_model or agent_model

3. Models API (backend/app/api/routes/models.py):

GET /api/models?provider=openrouter returns available models. For OpenRouter, it fetches the live model list from https://openrouter.ai/api/v1/models with a double-checked locking cache (TTL: model_cache_ttl_seconds, default 3600s). For OpenAI and Anthropic, returns static model lists.

Cost estimation: The chat route's _estimate_cost() function uses cached OpenRouter pricing data to calculate USD cost per request: prompt_tokens * prompt_price + completion_tokens * completion_price.

Frontend model picker: LlmModelSelector component (frontend/src/components/ui/LlmModelSelector.tsx) shows a provider dropdown (openai / anthropic / openrouter) and a model dropdown that loads models via api.models.list(provider).

5. Self-Learning and Feedback Loops

The system implements three interconnected feedback cycles that allow it to improve over time without explicit retraining.

flowchart TB
    subgraph cycle1 ["Cycle 1: Automatic (Validation Loop)"]
        VL["ValidationLoop\npre→execute→post→repair"]
        LA["LearningAnalyzer\nheuristic extractors"]
        LLMA["LLMAnalyzer\ncross-query patterns"]
    end
    subgraph cycle2 ["Cycle 2: User Feedback"]
        TU["Thumbs Up/Down"]
        DV["Data Validation\n(confirm/reject values)"]
        FP["FeedbackPipeline"]
        RAF["RAGFeedbackService"]
    end
    subgraph cycle3 ["Cycle 3: Knowledge Lifecycle"]
        CD["Confidence Decay"]
        CR["Conflict Resolution"]
        GP["Global Pattern\nPromotion"]
    end
    subgraph store ["Knowledge Store"]
        ALR["AgentLearning"]
        SNR["SessionNote"]
        BM["DataBenchmark"]
    end

    VL --> LA --> ALR
    VL --> LLMA --> ALR
    TU --> LA --> ALR
    DV --> FP --> ALR
    DV --> FP --> SNR
    DV --> FP --> BM
    RAF --> store
    CD --> ALR
    CD --> SNR
    CR --> ALR
    GP --> ALR

5.1 Cycle 1: Validation Loop Learning (Automatic)

This cycle requires zero user effort. Every time the SQL agent generates and executes a query, the system analyzes the attempt history and extracts lessons.

ValidationLoop (backend/app/core/validation_loop.py):

The SQL execution cycle:

Pre-validation (PreValidator): Schema checks — tables exist, columns exist, types compatible
Safety check (SafetyGuard): Read-only enforcement, DML blocking when configured
Explain validation (ExplainValidator): Warns on full table scans above row threshold
Execution: Runs the query against the user's database
Post-validation (PostValidator): Result sanity checks
Error classification (ErrorClassifier): Categorizes failures (table_not_found, column_not_found, syntax_error, timeout, permission_denied)
Context enrichment (ContextEnricher): Gathers additional schema info after failures
Query repair (QueryRepairer): LLM-based query fix using error context
Retry: Up to max_retries attempts with the repaired query

LearningAnalyzer (backend/app/knowledge/learning_analyzer.py):

After the validation loop completes, heuristic extractors analyze the sequence of query attempts:

Extractor	Detects	Example
`_detect_table_preference`	Wrong table A in attempt N, fixed to table B in attempt N+1	"Use `transactions` instead of `legacy_payments`"
`_detect_column_correction`	`column_not_found` error resolved by using a different column	"Column `user_name` doesn't exist on `users`. Use `full_name`"
`_detect_format_discovery`	Division by 100/1000 added in fix, or `::text` cast added	"Column `amount` on `orders` stores values in cents"
`_detect_schema_gotcha`	Soft-delete filter added (`deleted_at IS NULL`), or schema prefix added	"Table `users` uses soft-delete pattern"
`_detect_performance_hint`	Timeout resolved by adding LIMIT or date filter	"Table `events` can timeout on unfiltered queries"

Each extracted lesson is checked against code-DB sync warnings to avoid duplicate learnings that the sync system already provides.

LLMAnalyzer (backend/app/knowledge/learning_analyzer.py):

For deeper pattern extraction (e.g., multi-query anti-patterns), the system can trigger an LLM-based analysis:

Runs only when there are 3+ query attempts in a session
Rate-limited to max once per hour per connection (COOLDOWN_SECONDS = 3600)
LLM receives the full attempt history and outputs structured JSON lessons
Lessons are validated (category must be valid, confidence clamped to 0.5-0.9) before storage

5.2 Cycle 2: User Feedback Pipeline (Explicit)

Users provide feedback through two mechanisms:

Thumbs up/down (POST /api/chat/feedback):

Thumbs up: positive reinforcement (stored as user_rating on the message)
Thumbs down: triggers LearningAnalyzer.analyze_negative_feedback() which creates a query_pattern learning with the user's error description

Data validation (via FeedbackPipeline in backend/app/services/feedback_pipeline.py):

When the user provides expected values for a query result, a DataValidationFeedback record is created with a verdict:

Verdict	Pipeline Action
confirmed	Store as benchmark → `BenchmarkService.create_or_confirm()`
approximate	Store benchmark + create session note with deviation details
rejected	Create learning (categorized by rejection reason) + session note + flag existing benchmark as stale

Rejection categorization: The _derive_learning() function classifies the rejection reason:

Currency/format keywords → data_format category
Filter/where/status keywords → schema_gotcha category
Table/legacy keywords → table_preference category
Join/relationship keywords → schema_gotcha category
Anything else → generic schema_gotcha

RAG feedback (backend/app/services/rag_feedback_service.py):

After every chat response that used RAG, the system records which chunks were retrieved and whether the query succeeded. This creates a quality signal for each chunk:

RAGFeedback: project_id, chunk_id, source_path, doc_type, distance, query_succeeded, question_snippet, commit_sha
Used to improve retrieval quality over time

5.3 Cycle 3: Confidence Decay and Knowledge Lifecycle

The system implements a biological-inspired knowledge decay model:

Agent Learning decay (AgentLearningService.decay_stale_learnings()):

Triggered periodically (recommended: daily)
Learnings not updated in 30+ days: confidence -= 0.02
Below 0.2 confidence: deactivated (soft-deleted)
Result: unused knowledge gradually fades, frequently-confirmed knowledge persists

Session Note decay (SessionNotesService.decay_stale_notes()):

Unverified notes inactive for 60+ days: confidence -= 0.1
Floor at 0.1 confidence
User-verified notes are exempt from decay

Insight decay (InsightMemoryService.decay_stale_insights()):

Unconfirmed insights older than 30 days: confidence -= 0.05
Below 0.15: status changed to "expired"

Conflict resolution in AgentLearningService._resolve_conflicts():

When a new learning is created, checks existing ones with the same subject
Detects "negation flips" (e.g., old: "use table A", new: "never use table A")
If the new learning has higher confidence, the old conflicting one is deactivated
This ensures the system doesn't accumulate contradictory advice

Global pattern promotion in AgentLearningService.promote_global_patterns():

Scans all learnings across all connections
Learnings that appear on 2+ independent connections are identified as universal patterns
These patterns are promoted to connections that don't have them yet (up to 5 per prompt)
Format: "[global pattern, seen on N DBs] lesson text [confidence%]"

6. User Interaction Flow

6.1 Chat Flow (HTTP + SSE)

sequenceDiagram
    participant FE as Frontend (ChatPanel)
    participant API as POST /api/chat/ask/stream
    participant CA as ConversationalAgent
    participant OA as OrchestratorAgent
    participant WT as WorkflowTracker
    participant LLM as LLM Provider
    participant SA as Sub-Agents

    FE->>API: question + project_id + session_id
    API->>API: Load project, resolve provider/model
    API->>API: Get/create session, load history
    API->>CA: run(question, context)
    CA->>OA: run(AgentContext)
    OA->>WT: emit(pipeline_start)
    WT-->>FE: SSE: pipeline_start
    OA->>LLM: complete(messages + tools)
    WT-->>FE: SSE: thinking "Analyzing request..."
    LLM-->>OA: tool_calls: [query_database]
    WT-->>FE: SSE: thinking "Calling SQL Agent..."
    OA->>SA: SQLAgent.run()
    SA-->>OA: SQLAgentResult (query + results)
    WT-->>FE: SSE: thinking "SQL Agent: 150 rows, 5 columns"
    OA->>LLM: complete(messages + tool_result)
    LLM-->>OA: final_text (no tool calls)
    WT-->>FE: SSE: token "Here are..." (streaming)
    OA->>SA: VizAgent.run()
    SA-->>OA: VizResult (bar_chart)
    OA-->>CA: AgentResponse
    CA-->>API: AgentResponse
    API->>API: Save messages, estimate cost
    API->>API: Record RAG feedback
    WT-->>FE: SSE: pipeline_end
    API-->>FE: Final response JSON

WorkflowTracker (backend/app/core/workflow_tracker.py):

The in-memory event bus that powers real-time progress reporting:

Maintains a list of asyncio.Queue subscribers
Each SSE connection subscribes a queue
Events are WorkflowEvent dataclasses with: workflow_id, step, status, detail, elapsed_ms, timestamp, pipeline, extra
Events are broadcast to all subscribers matching the workflow_id

Key event types:

Step	Status	When
`pipeline_start`	`started`	Request begins
`thinking`	`in_progress`	Orchestrator reasoning steps
`orchestrator:llm_call`	`in_progress`	LLM call started
`orchestrator:sql_agent`	`in_progress`	SQL sub-agent invoked
`orchestrator:knowledge_agent`	`in_progress`	Knowledge search invoked
`token`	`streaming`	Final answer text chunks
`pipeline_end`	`completed`	Request finished
`orchestrator:llm_retry`	`retrying`	Provider error, retrying
`orchestrator:warning`	`degraded`	Graceful degradation notice

Session management:

Sessions are auto-created when a user sends the first message to a project
Title is generated from the first user question
Session rotation: when message count exceeds a threshold, a new session is started with an LLM-generated summary of the previous conversation

6.2 Pipeline Interaction (Multi-Stage)

For complex queries that trigger the multi-stage pipeline:

Response types and their meaning:

`response_type`	Meaning	User Action
`text`	Plain conversational answer	None
`sql_result`	Query + results + visualization (may contain compound `sql_results` array for multi-query answers)	View data, provide feedback
`knowledge`	RAG-based answer with sources	View sources
`clarification_request`	Agent needs more info	Answer the question
`stage_checkpoint`	Intermediate pipeline result	Continue / Modify / Retry
`stage_failed`	Pipeline stage failed validation	Retry / Modify
`pipeline_complete`	All stages finished successfully	View combined results
`error`	Unrecoverable error	Retry or rephrase

Pipeline resume flow:

When the user responds to a checkpoint or failure:

Frontend sends a message with pipeline_action ("continue"/"modify"/"retry") and pipeline_run_id in the extras
Orchestrator detects the pipeline action in _check_pipeline_resume()
_resume_pipeline() loads the PipelineRun from DB, reconstructs StageContext
For "continue": starts from next stage
For "retry": re-executes the current stage
For "modify": re-executes with user's modification as feedback

6.3 Frontend State Management

Zustand stores (frontend/src/stores/):

Store	Purpose
`app-store`	Active project, session, connection, sidebar state
`auth-store`	JWT tokens, user profile, login/logout
`log-store`	Workflow events for the activity log
`notes-store`	Session notes cache
`task-store`	Background task tracking (indexing, etc.)
`toast-store`	Notification toasts

State restoration (frontend/src/hooks/useRestoreState.ts):

On page load, restores active_session_id from localStorage
Fetches messages for the restored session via api.chat.getMessages()
Populates the chat panel with previous conversation

SSE client (frontend/src/lib/sse.ts):

Establishes an EventSource connection to the streaming endpoint
Parses WorkflowEvent JSON from each SSE event
Updates the log store and triggers UI re-renders for thinking indicators, token streaming, and pipeline status

6.4 API Endpoints (Chat)

Key chat-related endpoints in backend/app/api/routes/chat.py:

Method	Path	Purpose
`POST`	`/api/chat/sessions`	Create a new chat session
`GET`	`/api/chat/sessions`	List sessions for a project
`GET`	`/api/chat/sessions/{id}/messages`	Get messages for a session
`POST`	`/api/chat/ask`	Ask a question (synchronous)
`POST`	`/api/chat/ask/stream`	Ask a question (SSE streaming)
`POST`	`/api/chat/feedback`	Submit thumbs up/down
`POST`	`/api/chat/estimate`	Estimate cost before asking
`GET`	`/api/chat/search`	Search across chat history
`WS`	`/api/chat/ws/{project}/{connection}`	WebSocket for real-time chat

7. Knowledge Layer (RAG)

7.1 Vector Store

ChromaDB is used as the vector store (backend/app/knowledge/vector_store.py):

One collection per project
Documents are chunked code/documentation with metadata (source_path, doc_type, entity_type)
Similarity search returns ranked chunks for context injection

7.2 Repository Indexing Pipeline

flowchart LR
    GIT["Git Repository"]
    RA["RepoAnalyzer\n(clone/pull)"]
    PP["ProjectProfiler\n(high-level scan)"]
    EE["EntityExtractor\n(functions, classes)"]
    DG["LLM DocGenerator\n(enriched docs)"]
    CH["Chunker"]
    VS["ChromaDB"]

    GIT --> RA --> PP --> EE --> DG --> CH --> VS

Clone/Pull: RepoAnalyzer manages the local copy of the Git repository
Project profiling: High-level scan to identify framework, language, structure
Entity extraction: Parses source files to extract functions, classes, methods
LLM documentation generation: Generates enriched documentation for each entity
Chunking: Splits documents into appropriately sized chunks
Storage: Stored in ChromaDB with embeddings for similarity search

7.3 Database Schema Indexing

DbIndexService builds the "table map" that gives the orchestrator and SQL agent knowledge of the database structure:

Connects to the user's database and introspects schema
Builds a compact representation: table names, columns, types, relationships
Used as routing context in the orchestrator prompt and detailed context for the SQL agent

7.4 Code-DB Sync

Links ORM models found in the codebase to actual database tables:

Identifies Prisma schemas, SQLAlchemy models, Django models, etc.
Extracts conversion_warnings (e.g., "amounts stored in cents")
These warnings are used to dedup against learning analyzer output

7.5 Custom Rules

CustomRulesEngine (backend/app/knowledge/custom_rules.py):

Users can create project-specific rules via the UI or the manage_rules meta-tool
Rules are markdown documents with guidelines about how to query the database
Proactively injected into both the orchestrator and SQL agent system prompts (with budget-aware truncation) so every query respects business logic without relying on optional tool calls
Also available via get_custom_rules and get_query_context tools for additional retrieval
Rule Freshness Check: the orchestrator prompt includes a RULE FRESHNESS CHECK instruction that compares query results against loaded rules, detects discrepancies (e.g., unknown enum values), and proposes updates via manage_rules
Schema-aware rule validation: RuleService.validate_rules_against_schema() checks rules for references to tables that were dropped during a schema refresh; wired into the POST /connections/{id}/refresh-schema endpoint alongside learning validation

7.6 Table Resolution

table_resolver.py (backend/app/agents/table_resolver.py):

Lightweight heuristic (no LLM) that parses the compact table map and matches user question terms against known tables
Uses exact, plural/singular, substring, and keyword-to-description matching
Returns TableResolution(matched, fuzzy, unresolved) with confidence scores for fuzzy matches
build_resolution_hints() generates prompt-injectable warnings for unresolved or fuzzy-matched terms
The orchestrator injects these hints into the system prompt; the QUERY PLANNING rule mandates using ask_user when TABLE RESOLUTION WARNINGS are present

7.7 Execution Plan Visibility

The orchestrator emits a plan_summary event via WorkflowTracker before the tool-calling loop starts
Contains: tables, strategy (single_query/pipeline), rules_applied, learnings_applied, has_warnings
Forwarded via SSE as a plan_summary event type
Frontend renders a collapsible PlanSummaryCard above the thinking log

7.8 Agent Reasoning Panel

ReasoningPanel (frontend/src/components/chat/ReasoningPanel.tsx): right-side sliding panel showing full orchestrator internals
reasoning-store.ts: Zustand store collecting per-message reasoning traces (steps, plan, thinking, timing)
SSE events (step, agent_start, agent_end, thinking, tool_call, plan_summary) are piped into the store during streaming
Brain icon button on each assistant message (ChatMessage.tsx) toggles the panel
Displays: plan summary, thinking log, step timeline with icons and durations

8. Data Flow Diagrams

8.1 Simple Query — End to End

flowchart TB
    U["User: 'How many orders last month?'"]
    CP["ChatPanel (frontend)"]
    API["POST /api/chat/ask/stream"]
    HIST["Load chat history\n+ trim to budget"]
    CTX["Build context:\nschema + rules + learnings + overview"]
    OA["OrchestratorAgent"]
    LLM1["LLM Call 1:\n'I need to query the database'"]
    SQL["SQLAgent:\ngenerate SQL → validate → execute"]
    LLM2["LLM Call 2:\n'Based on the results...'"]
    VIZ["VizAgent:\nselect bar_chart"]
    RESP["AgentResponse:\nanswer + query + results + viz\n+ sql_results (compound)"]
    SAVE["Save messages + RAG feedback"]
    FE["Render DataTable + BarChart"]

    U --> CP --> API --> HIST --> CTX --> OA
    OA --> LLM1 -->|"tool: query_database"| SQL
    SQL -->|"150 rows"| OA
    OA --> LLM2 -->|"final text"| OA
    OA --> VIZ --> RESP
    RESP --> SAVE --> FE

8.2 Complex Pipeline — Multi-Stage

flowchart TB
    U["User: 'Build a summary table comparing\nrevenue by region for each quarter'"]
    CD["Complexity Detection:\nheuristic + adaptive LLM"]
    QP["QueryPlanner:\n3-stage plan"]
    S1["Stage 1: Get revenue by region"]
    V1["Validate: row count, columns"]
    CK1{Checkpoint?}
    S2["Stage 2: Get quarterly breakdown"]
    V2["Validate: cross-stage consistency"]
    S3["Stage 3: Combine into summary"]
    V3["Validate: final shape"]
    SYNTH["Synthesize final answer"]

    U --> CD -->|complex| QP
    QP --> S1 --> V1
    V1 -->|passed| CK1
    V1 -->|failed| S1
    CK1 -->|"user: continue"| S2
    CK1 -->|"user: modify"| S1
    S2 --> V2
    V2 -->|passed| S3
    S3 --> V3
    V3 -->|passed| SYNTH

8.3 Memory Layer Interaction

flowchart LR
    subgraph input ["Inputs"]
        UQ["User Question"]
        QR["Query Results"]
        VL["Validation Outcomes"]
        UF["User Feedback"]
    end
    subgraph memory ["Memory Layers"]
        L1["L1: Chat History\n(this session)"]
        L2["L2: Session Notes\n(this connection)"]
        L3["L3: Agent Learnings\n(this connection)"]
        L4["L4: Insights\n(this project)"]
    end
    subgraph output ["Outputs (injected into prompts)"]
        SYS["System Prompt"]
        SQLP["SQL Agent Prompt"]
    end

    UQ --> L1
    QR --> L1
    VL --> L3
    UF -->|"thumbs down"| L3
    UF -->|"data validation"| L2
    UF -->|"data validation"| L3
    QR -->|"anomaly"| L4

    L1 -->|"trimmed history"| SYS
    L2 -->|"compiled notes"| SQLP
    L3 -->|"compiled learnings"| SYS
    L3 -->|"compiled learnings"| SQLP
    L4 -->|"active insights"| SYS

8.4 Feedback / Learning Cycle

flowchart TB
    subgraph auto ["Automatic (every query)"]
        VLoop["Validation Loop\nattempt 1 → error → repair → attempt 2 → success"]
        HE["Heuristic Extractors:\ntable_preference, column_usage,\ndata_format, schema_gotcha,\nperformance_hint"]
        LLME["LLM Analyzer:\ncross-query patterns\n(3+ attempts, 1hr cooldown)"]
    end
    subgraph manual ["User-Triggered"]
        TD["Thumbs Down"]
        DV["Data Validation:\nconfirm / approximate / reject"]
    end
    subgraph store ["Storage"]
        AL["AgentLearning\n(confidence: 0.6)"]
        SN["SessionNote"]
        BM["DataBenchmark"]
    end
    subgraph lifecycle ["Lifecycle"]
        CONF["+0.1 per confirmation\ncap at 1.0"]
        DECAY["-0.02/month stale\ndeactivate below 0.2"]
        CONFLICT["Conflict resolution:\nnew supersedes old"]
        GLOBAL["Global promotion:\nseen on 2+ DBs"]
    end
    subgraph usage ["Usage"]
        PROMPT["Injected into\nOrchestrator + SQL prompts"]
    end

    VLoop --> HE --> AL
    VLoop --> LLME --> AL
    TD --> AL
    DV -->|confirmed| BM
    DV -->|approximate| BM
    DV -->|approximate| SN
    DV -->|rejected| AL
    DV -->|rejected| SN

    AL --> CONF
    AL --> DECAY
    AL --> CONFLICT
    AL --> GLOBAL
    AL --> PROMPT
    SN --> PROMPT

8.5 LLM Provider Fallback Chain

flowchart TB
    REQ["LLM Request"]
    FC["Build Fallback Chain:\npreferred → openai → anthropic → openrouter\n(filter: has API key + is healthy)"]
    P1["Provider 1 (preferred)"]
    R1{"Retry\n(up to 3x)"}
    P2["Provider 2 (fallback)"]
    R2{"Retry\n(up to 3x)"}
    P3["Provider 3 (fallback)"]
    R3{"Retry\n(up to 3x)"}
    OK["LLMResponse"]
    FAIL["LLMAllProvidersFailedError"]
    UH["Mark Unhealthy\n(quarantine 120s)"]
    HC["Health Check Loop\n(ping every 60s)"]

    REQ --> FC --> P1
    P1 --> R1
    R1 -->|success| OK
    R1 -->|"auth error"| FAIL
    R1 -->|"token limit"| P2
    R1 -->|"retryable fail"| UH --> P2
    P2 --> R2
    R2 -->|success| OK
    R2 -->|"retryable fail"| P3
    P3 --> R3
    R3 -->|success| OK
    R3 -->|"all failed"| FAIL
    UH -.-> HC -.->|"recovered"| FC

Appendix: Configuration Reference

Key settings from backend/app/config.py that affect system behavior:

Setting	Default	Description
`default_llm_provider`	`"openai"`	Primary LLM provider
`openai_api_key`	—	OpenAI API key
`anthropic_api_key`	—	Anthropic API key
`openrouter_api_key`	—	OpenRouter API key
`max_context_tokens`	varies	Maximum tokens for context window
`max_history_tokens`	varies	Maximum tokens for chat history
`max_orchestrator_iterations`	`100`	Tool-calling loop safety ceiling
`orchestrator_wrap_up_steps`	`3`	Steps remaining to trigger wrap-up prompt
`orchestrator_final_synthesis`	`true`	Enable LLM synthesis on step exhaustion
`max_investigation_iterations`	`12`	Investigation agent loop limit
`history_summary_model`	—	Model for history summarization
`model_cache_ttl_seconds`	`3600`	How long to cache OpenRouter model list
`max_pie_categories`	varies	Threshold for pie chart → bar chart fallback

FilesExpand file tree

SYSTEM_ARCHITECTURE.md

Latest commit

History