A sophisticated AI-powered database analysis tool that allows users to interact with databases using natural language queries. Built with Python, Dash, and multiple AI query engines.
- Python 3.8+
- Docker and Docker Compose (for PostgreSQL setup)
- OpenAI API key (or use free local embeddings)
git clone <repository-url>
cd db-agent-app
./setup.shThe setup script will:
- Create your
.envfile - Let you choose between
uvorpip - Set up the virtual environment
- Install all dependencies
git clone <repository-url>
cd db-agent-app
cp .env.copy .envImportant: Edit .env and replace your_actual_openai_api_key_here with your real OpenAI API key, or configure free local embeddings (see Embedding Configuration below).
Choose one of the following methods:
# Install uv if you haven't already
# Visit: https://docs.astral.sh/uv/getting-started/installation/
# Create and activate virtual environment
uv venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install dependencies
uv pip install -r requirements.txt# Create and activate virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt# With uv
uv pip install -e .
# With pip
pip install -e .Choose one of the following database options:
The app includes a sample library database with books, users, and borrowing records.
# Start PostgreSQL database
docker-compose up -d
# Wait a few seconds for the database to initializeThe database will be available at:
- Host: localhost
- Port: 5432 (or check your .env file)
- Database: library
- Username: postgres
- Password: postgres
# Create database directory
mkdir -p database/sqllite3
# Initialize SQLite database with sample data
sqlite3 database/sqllite3/library.db < database/sqllite3/sqllite3_seed.sqlFor SQLite3, update your .env file:
DB_TYPE=sqlite3
python app.pyThe app will be available at http://localhost:8050
-
Open the app in your browser at
http://localhost:8050 -
Click "Connect to Database"
-
Use these connection details:
For PostgreSQL:
- Database Type: PostgreSQL
- Host: localhost
- Port: 5432 (or check your .env file)
- Database Name: library
- Username: postgres
- Password: postgres
For SQLite3:
- Database Type: SQLite3
- Leave other fields empty (uses local file)
Select from multiple AI-powered query approaches:
- Schema-Based: Direct SQL generation from database schema
- RAG: Retrieval-Augmented Generation (basic)
- RAG (Self-Correction and Validation): Advanced RAG with 4-layer validation system
- Multi-Table Join: Complex relationship queries
- Visualize: Generate interactive charts and graphs
Once connected, you can ask questions like:
Basic Queries:
- "Show me all users"
- "What books are available?"
- "Who has borrowed books?"
- "Show me overdue books"
- "What are the book ratings?"
Visualization Queries:
- "Visualize book ratings by genre"
- "Show me a chart of books by publication year"
- "Plot the distribution of user ages"
- "Create a pie chart of book genres"
Advanced Queries:
- "Which users have the most overdue books?"
- "Show me books with ratings above 4 stars"
- "Find users who haven't returned books yet"
The app comes with a pre-populated library database:
users: Library members with names, emails, and phone numbersbooks: Book catalog with titles, authors, genres, and copy countsbook_loans: Tracks who borrowed what and when it's duebook_reviews: User ratings and reviews for books
The app supports multiple LLM providers and embedding models:
# OpenAI (Default)
RAG_PROVIDER=openai
RAG_MODEL=gpt-4o-mini
RAG_API_KEY=your_openai_api_key_here
# Groq (Fast & Free)
RAG_PROVIDER=groq
RAG_MODEL=llama-3.1-8b-instant
GROQ_API_KEY=your_groq_api_key_here
# Local Ollama
RAG_PROVIDER=ollama
RAG_MODEL=llama3.1
# No API key needed
# Anthropic Claude
RAG_PROVIDER=anthropic
RAG_MODEL=claude-3-5-sonnet-20241022
ANTHROPIC_API_KEY=your_anthropic_api_key_here
# Google Gemini
RAG_PROVIDER=gemini
RAG_MODEL=gemini-1.5-pro
GEMINI_API_KEY=your_gemini_api_key_hereOpenAI Embeddings (High Quality)
EMBEDDING_PROVIDER=openai
EMBEDDING_MODEL=text-embedding-3-small
RAG_API_KEY=your_openai_api_key_hereLocal Ollama Embeddings (Recommended - Fast & Free)
EMBEDDING_PROVIDER=ollama
EMBEDDING_MODEL=nomic-embed-text
OLLAMA_BASE_URL=http://localhost:11434
# Requires: ollama pull nomic-embed-textLocal HuggingFace Embeddings (Free)
EMBEDDING_PROVIDER=huggingface
EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2
# Requires: pip install sentence-transformersAUTO_INDEX_SCHEMA=true # Automatically index database schema to LanceDB
LANCEDB_PATH=./lancedb_rag # Where to store vector embeddingsIf you see "LanceDB not initialized" but embeddings work fine:
- This is normal! The recent fixes ensure the RAG agent works properly
- LanceDB connection and embeddings are now fully compatible
- Schema indexing will work when you connect to a database
If embeddings fail to initialize:
- Check that your embedding provider is properly configured
- For Ollama: Ensure
ollama pull nomic-embed-textwas run - For HuggingFace: Install with
pip install sentence-transformers - For OpenAI: Verify your API key is valid
- Schema indexing will be disabled, but the RAG agent will still work
This project supports multiple Python package managers:
- Faster installation: 10-100x faster than pip
- Better dependency resolution: More reliable than pip
- Installation: Visit uv installation guide
- Widely supported: Works everywhere Python works
- Familiar: Standard Python package manager
- Reliable: Battle-tested and stable
Both requirements.txt and pyproject.toml are kept in sync. You can use either:
# Using uv
uv pip install -r requirements.txt
# Using pip
pip install -r requirements.txt
# Development install (either)
uv pip install -e .
pip install -e .- Port conflicts: If port 5432 is in use, update
DB_PORTin.env - API key issues: Ensure your API keys are valid and have sufficient credits
- Database connection: Wait a few seconds after
docker-compose upbefore connecting - Package conflicts: Use a fresh virtual environment if you encounter dependency issues
- Embedding errors: Run
python test_embedding_system.pyto verify embedding setup - LanceDB issues: Check that
LANCEDB_PATHdirectory is writable
# Test all dependencies
python test_dependencies.py
# Test embedding system
python test_embedding_system.py
# Test RAG agent initialization
python test_rag_init.py
# Test model switching
python test_model_switching.pyMake sure your .env file is properly configured:
# Database settings
DB_TYPE=postgresql # or sqlite3
DB_HOST=localhost
DB_PORT=5432
DB_NAME=library
DB_USER=postgres
DB_PASSWORD=postgres
# LLM Configuration
RAG_PROVIDER=openai # openai, groq, ollama, anthropic, gemini
RAG_MODEL=gpt-4o-mini
RAG_API_KEY=sk-your-actual-key-here
# Embedding Configuration
EMBEDDING_PROVIDER=openai # openai or huggingface
EMBEDDING_MODEL=text-embedding-3-small
AUTO_INDEX_SCHEMA=true
# Alternative API Keys
GROQ_API_KEY=your_groq_api_key_here
ANTHROPIC_API_KEY=your_anthropic_api_key_here
GEMINI_API_KEY=your_gemini_api_key_heredb-agent-app/
├── app.py # Main Dash application
├── config.py # Configuration settings
├── requirements.txt # Python dependencies
├── docker-compose.yml # Database setup
├── .env # Environment variables
├── database/
│ ├── __init__.py
│ ├── connection.py # Database connection management
│ ├── query_engine.py # AI-powered query generation
│ └── seed.sql # Sample database data
└── README.md
SchemaBasedQueryEngine: Generates SQL using OpenAI and database schemaRAGAgent: Advanced agent with self-correction and 4-layer validationDatabaseConnection: Manages database connections and queries with auto-indexingSecurityGuardrail: Validates SQL queries for securitySchemaIndexer: Automatically indexes database schema to LanceDB on connectionLanceDB Integration: Embedded vector database for RAG context retrievalModelSwitcher: Easy switching between different LLM providers and models- Conversation Context: Maintains chat history for better AI responses
The RAG Agent with Self-Correction implements a sophisticated 4-layer validation system:
- Layer 1: Syntactic Validation - SQLFluff and SQLGlot for syntax checking
- Layer 2: Semantic Validation - Schema verification to prevent AI hallucinations
- Layer 3: AI-Powered Self-Correction - Execution feedback loop with iterative debugging
- Layer 4: Performance Guardrails - Automatic LIMIT clauses and safety measures
- Frontend: Dash + Bootstrap (dark theme)
- Backend: Python, SQLAlchemy, Pandas
- AI: Multi-provider LLM support via LiteLLM
- LLM Providers: OpenAI, Groq, Ollama, Anthropic, Gemini
- Embeddings: OpenAI, HuggingFace (local/free)
- Databases: PostgreSQL, MySQL, SQLite3
- Visualization: Plotly
- Vector DB: LanceDB (embedded, no Docker required)
- SQL Validation: SQLFluff, SQLGlot
- Workflow: LangGraph for agent orchestration
The advanced RAG agent implements a sophisticated workflow:
User Query → Schema Retrieval → Context Retrieval → Query Generation
↓
Syntactic Validation → Semantic Validation → Performance Guards
↓
Query Execution → Self-Correction Loop → Formatted Output
Automatic Schema Indexing: When you connect to a database, the system automatically:
- Extracts database schema (tables, columns, relationships)
- Creates intelligent text chunks with sample data
- Generates embeddings using your chosen model
- Stores embeddings in LanceDB for fast similarity search
- Enables intelligent query generation with proper table/column names
LanceDB Benefits:
- No Docker Required: Embedded database that runs in-process
- High Performance: Optimized for similarity search and retrieval
- Persistent Storage: Data persists between application restarts
- Easy Setup: No external services or configuration needed
- Automatic Indexing: Schema changes are automatically re-indexed
Embedding Options:
- OpenAI: High quality, ~$0.02 for typical database
- HuggingFace: Free, runs locally, good quality
This makes the application easier to deploy and maintain while providing powerful RAG capabilities with intelligent schema understanding.
# Use local models only
RAG_PROVIDER=ollama
RAG_MODEL=llama3.1
EMBEDDING_PROVIDER=huggingface
EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2# Use OpenAI for both LLM and embeddings
RAG_PROVIDER=openai
RAG_MODEL=gpt-4o
RAG_API_KEY=your_openai_api_key_here
EMBEDDING_PROVIDER=openai
EMBEDDING_MODEL=text-embedding-3-large# Use Groq for LLM, local embeddings
RAG_PROVIDER=groq
RAG_MODEL=llama-3.1-8b-instant
GROQ_API_KEY=your_groq_api_key_here
EMBEDDING_PROVIDER=huggingface
EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2For detailed embedding configuration, see EMBEDDING_SYSTEM.md.