An autonomous AI-powered data analytics system that transforms raw datasets into professional meaningful visualizations and interactive dashboards
🚀 Live Demo (V1) • 📺 Video Demo •
Capstone Project for Google's 5-Day AI Agents Intensive Course
- 🎯 Problem & Solution
- 🏗️ System Architecture
- ✨ Core Capabilities
- 🚀 Quick Start
- 🎨 Features
- 🔧 Technical Stack
- 📊 Output Deliverables
- 📈 Use Cases
- 🛡️ Security
- 📚 Project Structure
Modern data analysis faces critical barriers:
- Complexity: Multiple tools required for cleaning, analysis, and visualization
- Technical Skills: Demands expertise in Python, pandas, and visualization libraries
- Time Investment: Manual processes consume hours of productive time
- Accessibility: Non-technical users locked out of advanced analytics
- Inconsistency: Variable quality based on individual expertise
DataLens AI democratizes data analysis through AI automation:
Raw Data → AI Processing → Professional Insights
↓ ↓ ↓
Upload → Gemini Analysis → Interactive Dashboard
Key Benefits:
- 🤖 AI-Driven: Leverages Google's Gemini API for intelligent processing
- ⚡ Fast: Hours of work reduced to minutes
- 🎯 Complete: End-to-end pipeline in a single notebook
- 🚀 No-Code: Upload and process without manual coding
- 📊 Professional: Publication-quality visualizations
graph LR
A[🔧 Setup Environment] --> B[🤖 Initialize Gemini AI]
B --> C[📁 Load Data]
C --> D[🧠 AI Analysis & Cleaning]
D --> E[📊 Generate Visualizations]
E --> F[📈 Interactive Dashboard]
style A fill:#e3f2fd,stroke:#1976d2,stroke-width:3px,color:#000000
style B fill:#f3e5f5,stroke:#7b1fa2,stroke-width:3px,color:#000000
style C fill:#fff3e0,stroke:#f57c00,stroke-width:3px,color:#000000
style D fill:#e8f5e9,stroke:#388e3c,stroke-width:3px,color:#000000
style E fill:#fce4ec,stroke:#c2185b,stroke-width:3px,color:#000000
style F fill:#e0f2f1,stroke:#00796b,stroke-width:3px,color:#000000
┌─────────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ Data Input │ │ AI Processing │ │ Output Generation│
│ │ │ │ │ │
│ • CSV/Excel │───▶│ • Gemini AI │───▶│ • Visualizations │
│ • Raw Data │ │ • Analysis │ │ • Dashboard │
│ • File Upload │ │ • Code Generation│ │ • Reports │
└─────────────────┘ └──────────────────┘ └──────────────────┘
│ │ │
└────────────────────────┼────────────────────────┘
│
┌──────────────────┐
│ Data Processing │
│ │
│ • Cleaning │
│ • Transformation │
│ • Encoding │
└──────────────────┘
|
Gemini API for automated quality assessment and insights generation |
Intelligent cleaning code generation based on data profiling |
10+ chart types with professional styling and interactivity |
|
Real-time filters, KPI cards, and auto-updating charts |
ML-ready datasets with encoding and standardization |
Interactive widget supporting CSV and Excel formats |
| Requirement | Version | Status |
|---|---|---|
| Python | 3.8+ | ✅ Required |
| Google Colab | - | 🌟 Recommended |
| Gemini API Key | - | 🔑 Required |
pip install pandas numpy matplotlib seaborn plotly scikit-learn ipywidgets \
jsonschema google-generativeai google-auth google-auth-oauthlib \
openpyxl xlrd jupyterlab1. Access the Notebook
# Open in Google Colab
File → Upload → Select the .ipynb notebook2. Configure API Key
# Add Gemini API key to Colab Secrets
# 1. Click 🔑 in left sidebar
# 2. Add new secret: GEMINI_API_KEY = "your_api_key"3. Run the Pipeline
# Execute cells sequentially:
# Cells 1-2: Environment setup
# Cells 3-4: AI initialization
# Cell 5: Data upload
# Cells 6-9: AI cleaning
# Cells 10-14: Visualizations
# Cells 15-17: Dashboard
# Cells 18-19: Reports🔧 Initialize Environment
# Cell 1: Install dependencies
!pip install pandas numpy matplotlib seaborn plotly scikit-learn ipywidgets \
jsonschema google-generativeai --quiet⏱️ ~2 minutes
🤖 Initialize Gemini AI
# Cell 3-4: Configure API
from google.colab import userdata
import google.genai as genai
api_key = userdata.get("GEMINI_API_KEY")
client = genai.Client(api_key=api_key)⏱️ ~30 seconds
📁 Load Dataset
# Cell 5: Upload and analyze
df = upload_dataset()
dataset_summary = generate_dataset_summary(df)⏱️ Variable (depends on file size)
🧹 AI-Powered Cleaning
# Cell 6-7: Automated cleaning
cleaning_prompt = build_cleaning_prompt(dataset_summary)
cleaning_output = ask_gemini_cleaning(cleaning_prompt)⏱️ ~1 minute
📊 Generate Visualizations
# Cell 10-14: Create charts
viz_code = prompt_gemini(viz_prompt)
exec(viz_code)
# Cell 15-17: Build dashboard
dashboard_code = prompt_gemini(dash_prompt)
exec(dashboard_code)⏱️ ~2 minutes
- Comprehensive Summary: Statistical metrics, missing values, data type profiling
- AI Quality Assessment: Gemini-powered evaluation
- Column-wise Analysis: Detailed numeric and categorical insights
| Feature | Description | Status |
|---|---|---|
| Missing Value Detection | Automatic identification and handling | ✅ |
| Outlier Management | 99th percentile statistical capping | ✅ |
| Data Normalization | Column standardization and value scaling | ✅ |
| Categorical Encoding | One-hot encoding for ML readiness | ✅ |
| Negative Value Handling | Automatic conversion to absolute values | ✅ |
|
• Histograms |
• Real-time filtering |
• Custom styling |
|
Advanced data analysis and cleaning recommendations 🔴 Thorough & Comprehensive |
Fast visualization code generation 🟢 Quick & Efficient |
pandas # Data manipulation
numpy # Numerical operationsmatplotlib # Static plots
seaborn # Statistical graphics
plotly # Interactive chartsscikit-learn # Preprocessing & encoding |
google-generativeai # Gemini APIipywidgets # Dashboard widgetsjsonschema # Data validation |
📓 Cells 1-2: Environment Setup (Dependencies)
🤖 Cells 3-4: AI Initialization (Gemini Config)
📁 Cell 5: Data Loading (Upload & Profile)
🧹 Cells 6-9: AI Cleaning (Quality Improvement)
📊 Cells 10-14: Visualization (Chart Generation)
📈 Cells 15-17: Dashboard (Interactive Interface)
📋 Cells 18-19: Reporting (Insights & Recommendations)
|
ML-ready with encoding and standardization |
10+ professional charts |
Real-time analytics with KPIs |
Automated insights & recommendations |
|
|
╔═══════════════════════════════════════════════════════════╗
║ SECURITY & PRIVACY MEASURES ║
╠═══════════════════════════════════════════════════════════╣
║ ✅ Secure API Handling ║
║ → API keys stored in Colab secrets ║
║ ║
║ ✅ No Hardcoded Credentials ║
║ → Secure authentication practices ║
║ ║
║ ✅ Data Privacy ║
║ → Local processing without external transmission ║
╚═══════════════════════════════════════════════════════════╝
DataLens-AI-Intelligent-Data-Analytics-Agent/
│
└── 📊 DataLens AI - Intelligent Data Analytics Agent.ipynb
(Version 2 - Optimized for Google Colab)
|
🌐 Version 1 - Live Demo Deployed on Hugging Face Spaces |
📦 Version 2 - Current Available in this Repository |
⚠️ REQUIREMENTS
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
✓ Google Colab environment recommended
✓ Gemini API key configured in Colab secrets
✓ Supports CSV and Excel file formats
✓ Automatic dependency installation







