DataChat is an AI-powered data analytics platform that enables users to interact with their data using natural language. Upload a CSV file, ask questions in plain English, and receive instant insights through automatically generated SQL queries, visualizations, and intelligent caching.
- 🗣️ Natural Language Querying — Ask questions about your data in plain English
- 📤 CSV Upload — Seamlessly upload and analyze CSV datasets
- 🔄 Automatic SQL Generation — Powered by Groq's LLM for fast, accurate SQL conversion
- 📊 Smart Visualizations — Interactive charts and tables using Recharts
- 🧠 RAG-based Semantic Cache — Speed up similar queries using pgvector embeddings
- 💾 Conversation Memory — Maintains context across multiple queries
- 🔒 SQL Validation — Built-in safety checks for secure query execution
- ⚡ Real-time Execution — Query results displayed instantly
User Query → Embedding → RAG Cache Check → SQL Generation (Groq) → Validation → Execution (Supabase) → Visualization
↓ ↓
Cache Hit (Fast) Store in Cache
- Frontend: Next.js 14 (App Router), React, TailwindCSS, shadcn/ui
- Backend: Next.js API Routes, Supabase (PostgreSQL + pgvector)
- AI/ML: Groq LLM (SQL Generation), Ollama (Embeddings - nomic-embed-text)
- Visualization: Recharts
- Database: Supabase PostgreSQL with pgvector extension
| Name | Roll No. | Contribution |
|---|---|---|
| PIYUSH PRASHANT | 24BDS055 | RAG Pipeline, Embeddings, Semantic Cache Implementation |
| MS HARSHITHA | 24BDS038 | Testing, Documentation, System Integration |
| PRIYANSHU MITTAL | 24BDS058 | Backend Development, Supabase Integration, Data Visualization |
| JAKKUVA SAMEER | 24BDS026 | Frontend UI/UX, Debugging, Component Development |
DATACHAT_NL--SQL-MAIN/
├── app/ # Next.js App Router
│ ├── api/
│ │ ├── chat/ # Main NL → SQL chat endpoint
│ │ ├── upload-csv/ # CSV upload & table creation
│ │ └── ...
│ ├── layout.tsx # Root layout
│ ├── page.tsx # Main application page
│ └── globals.css # Global styles
│
├── components/ # React Components
│ ├── chat-interface.tsx # Main chat interface
│ ├── chat-message.tsx # Individual message component
│ ├── results-table.tsx # Query results table
│ ├── data-chart.tsx # Chart visualizations
│ ├── csv-upload.tsx # CSV upload component
│ └── ui/ # shadcn/ui components
│
├── lib/ # Core Backend Logic
│ ├── setup.ts # Database initialization
│ ├── embeddings.ts # Ollama embedding generation
│ ├── query-cache.ts # Semantic cache with pgvector
│ ├── query-executor.ts # SQL execution engine
│ ├── db.ts # Dynamic schema loading
│ ├── memory.ts # Conversation memory
│ ├── session.ts # Session management
│ ├── prompts.ts # LLM system prompts
│ ├── sql-validator.ts # SQL safety validation
│ └── types.ts # TypeScript definitions
│
├── scripts/ # Database Scripts
│ ├── 01-setup-database-functions.sql
│ ├── 02-reload-schema-cache.sql
│ ├── 03-conversation-memory.sql
│ └── conversation-query-cache.sql
│
├── .env.local # Environment configuration
├── env.example # Environment template
├── DEPLOYMENT.md # Deployment instructions
├── COMPLETE-PROJECT-SUMMARY.md # Detailed project documentation
├── package.json
└── README.md
Before you begin, ensure you have the following installed:
- Node.js 18.x or higher (Download)
- npm or pnpm (comes with Node.js)
- Ollama (Installation Guide)
- Supabase Account (Sign Up)
- Groq API Key (Get API Key)
git clone https://github.com/yourusername/datachat-nl-sql.git
cd datachat-nl-sqlUsing npm:
npm installOr using pnpm:
pnpm installCreate a .env.local file in the root directory:
# Supabase Configuration
NEXT_PUBLIC_SUPABASE_URL=https://your-project.supabase.co
NEXT_PUBLIC_SUPABASE_ANON_KEY=your-anon-key
SUPABASE_SERVICE_ROLE_KEY=your-service-role-key
# Groq API Configuration
GROQ_API_KEY=gsk_your_groq_api_key
# Embedding Configuration
EMBEDDING_MODEL=nomic-embed-text
EMBEDDING_DIM=768Getting Your Credentials:
- Supabase: Go to your Supabase Dashboard → Project Settings → API
- Groq: Visit Groq Console → API Keys
Install Ollama:
-
Windows:
winget install Ollama.Ollama
-
macOS:
brew install ollama
-
Linux:
curl -fsSL https://ollama.com/install.sh | sh
Pull the Embedding Model:
ollama pull nomic-embed-textStart Ollama Server:
ollama serveNote: Keep Ollama running in a separate terminal window while using DataChat.
The application will automatically create the required tables and extensions on first run. Ensure your Supabase project has the pgvector extension enabled:
- Go to your Supabase Dashboard
- Navigate to Database → Extensions
- Search for
pgvectorand enable it
npm run devOpen your browser and navigate to:
http://localhost:3000
- Click on the "Upload CSV" button
- Select a CSV file from your computer
- Wait for the file to be processed and loaded into Supabase
- You'll see a confirmation message with your table name
Type natural language questions about your data:
Examples:
- "Show me all records"
- "What is the average salary by department?"
- "Find the top 5 customers by revenue"
- "How many orders were placed last month?"
Results are displayed in three formats:
- 📋 Table View: Paginated data table
- 📊 Chart View: Interactive visualizations (bar, line, stacked)
- 💬 Summary: Natural language explanation
- Ask follow-up questions
- The system maintains conversation context
- Similar queries are cached for faster responses
graph LR
A[User Question] --> B[Generate Embedding]
B --> C{Check Cache}
C -->|Cache Hit| D[Return Cached Result]
C -->|Cache Miss| E[Generate SQL with Groq]
E --> F[Validate SQL]
F --> G[Execute on Supabase]
G --> H[Store in Cache]
H --> I[Render Results]
D --> I
The system uses pgvector to store question embeddings and find similar queries:
| Column | Type | Description |
|---|---|---|
question |
text | Original natural language question |
normalized_sql |
text | Generated SQL query |
result_sample |
jsonb | Sample results (1-100 rows) |
row_count |
integer | Total number of rows |
question_embedding |
vector(768) | Question embedding vector |
session_id |
text | Session identifier |
table_name |
text | Target table name |
Similarity Threshold: 0.85 (configurable)
Powered by Groq's high-performance LLM models:
- Context-aware query generation
- Handles complex aggregations and joins
- Generates explanations alongside SQL
- Finds similar questions using cosine similarity
- Reduces LLM API calls
- Instant responses for repeated queries
- Configurable similarity threshold
Built-in safety checks prevent:
- DROP/DELETE/TRUNCATE operations
- System table access
- SQL injection attempts
- Unauthorized schema modifications
- Maintains full conversation history
- Enables contextual follow-up questions
- Stored per session in Supabase
Watch DataChat in action:
Demo Highlights:
- CSV upload and table creation
- Natural language query examples
- SQL generation process
- Interactive data visualizations
- Semantic cache demonstration
- Follow-up question handling
| Command | Description |
|---|---|
npm run dev |
Start development server |
npm run build |
Build for production |
npm start |
Start production server |
npm run lint |
Run ESLint |
npm run type-check |
Run TypeScript type checking |
Refer to env.example for all required environment variables.
Issue: Ollama connection error
- Solution: Ensure Ollama is running (
ollama serve) - Check if the model is pulled (
ollama pull nomic-embed-text)
Issue: Supabase connection failed
- Solution: Verify your
.env.localcredentials - Ensure pgvector extension is enabled
Issue: SQL execution error
- Solution: Check if the table exists
- Verify the SQL query syntax
- Review validation rules
Issue: Charts not rendering
- Solution: Ensure data has numeric columns
- Check browser console for errors
For more detailed information:
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Groq for lightning-fast LLM inference
- Supabase for PostgreSQL and pgvector support
- Ollama for local embedding generation
- Next.js team for the excellent framework
- shadcn/ui for beautiful UI components
For questions or support, please reach out to the team:
- Project Lead: Priyanshu Mittal
Making data analysis accessible through natural language