Skip to content

Latest commit

 

History

History
586 lines (461 loc) · 16.4 KB

File metadata and controls

586 lines (461 loc) · 16.4 KB

📊 DataLens AI

Intelligent Data Analytics Agent

Python Jupyter Gemini

An autonomous AI-powered data analytics system that transforms raw datasets into professional meaningful visualizations and interactive dashboards

🚀 Live Demo (V1)📺 Video Demo

Capstone Project for Google's 5-Day AI Agents Intensive Course

divider

📖 Table of Contents


🎯 Problem & Solution

The Challenge

Modern data analysis faces critical barriers:

  • Complexity: Multiple tools required for cleaning, analysis, and visualization
  • Technical Skills: Demands expertise in Python, pandas, and visualization libraries
  • Time Investment: Manual processes consume hours of productive time
  • Accessibility: Non-technical users locked out of advanced analytics
  • Inconsistency: Variable quality based on individual expertise

Our Solution

DataLens AI democratizes data analysis through AI automation:

Raw Data → AI Processing → Professional Insights
   ↓            ↓                    ↓
Upload → Gemini Analysis → Interactive Dashboard

Key Benefits:

  • 🤖 AI-Driven: Leverages Google's Gemini API for intelligent processing
  • Fast: Hours of work reduced to minutes
  • 🎯 Complete: End-to-end pipeline in a single notebook
  • 🚀 No-Code: Upload and process without manual coding
  • 📊 Professional: Publication-quality visualizations

divider

🏗️ System Architecture

Pipeline Workflow

graph LR
    A[🔧 Setup Environment] --> B[🤖 Initialize Gemini AI]
    B --> C[📁 Load Data]
    C --> D[🧠 AI Analysis & Cleaning]
    D --> E[📊 Generate Visualizations]
    E --> F[📈 Interactive Dashboard]
    
    style A fill:#e3f2fd,stroke:#1976d2,stroke-width:3px,color:#000000
    style B fill:#f3e5f5,stroke:#7b1fa2,stroke-width:3px,color:#000000
    style C fill:#fff3e0,stroke:#f57c00,stroke-width:3px,color:#000000
    style D fill:#e8f5e9,stroke:#388e3c,stroke-width:3px,color:#000000
    style E fill:#fce4ec,stroke:#c2185b,stroke-width:3px,color:#000000
    style F fill:#e0f2f1,stroke:#00796b,stroke-width:3px,color:#000000
Loading

Component Architecture

┌─────────────────┐    ┌──────────────────┐    ┌──────────────────┐
│   Data Input    │    │  AI Processing   │    │ Output Generation│
│                 │    │                  │    │                  │
│ • CSV/Excel     │───▶│ • Gemini AI     │───▶│ • Visualizations │
│ • Raw Data      │    │ • Analysis       │    │ • Dashboard      │
│ • File Upload   │    │ • Code Generation│    │ • Reports        │
└─────────────────┘    └──────────────────┘    └──────────────────┘
         │                        │                        │
         └────────────────────────┼────────────────────────┘
                                  │
                         ┌──────────────────┐
                         │ Data Processing  │
                         │                  │
                         │ • Cleaning       │
                         │ • Transformation │
                         │ • Encoding       │
                         └──────────────────┘

divider

✨ Core Capabilities

🤖 AI-Driven Intelligence

Gemini API for automated quality assessment and insights generation

🧹 Smart Data Cleaning

Intelligent cleaning code generation based on data profiling

📊 Advanced Visualizations

10+ chart types with professional styling and interactivity

📈 Interactive Dashboard

Real-time filters, KPI cards, and auto-updating charts

🏭 Production-Ready

ML-ready datasets with encoding and standardization

📁 Seamless Upload

Interactive widget supporting CSV and Excel formats

divider

🚀 Quick Start

Prerequisites

Requirement Version Status
Python 3.8+ ✅ Required
Google Colab - 🌟 Recommended
Gemini API Key - 🔑 Required

Installation

pip install pandas numpy matplotlib seaborn plotly scikit-learn ipywidgets \
            jsonschema google-generativeai google-auth google-auth-oauthlib \
            openpyxl xlrd jupyterlab

Setup Steps

1. Access the Notebook

# Open in Google Colab
File → Upload → Select the .ipynb notebook

2. Configure API Key

# Add Gemini API key to Colab Secrets
# 1. Click 🔑 in left sidebar
# 2. Add new secret: GEMINI_API_KEY = "your_api_key"

3. Run the Pipeline

# Execute cells sequentially:
# Cells 1-2:  Environment setup
# Cells 3-4:  AI initialization  
# Cell 5:     Data upload
# Cells 6-9:  AI cleaning
# Cells 10-14: Visualizations
# Cells 15-17: Dashboard
# Cells 18-19: Reports

Usage Example

🔧 Initialize Environment
# Cell 1: Install dependencies
!pip install pandas numpy matplotlib seaborn plotly scikit-learn ipywidgets \
            jsonschema google-generativeai --quiet

⏱️ ~2 minutes

🤖 Initialize Gemini AI
# Cell 3-4: Configure API
from google.colab import userdata
import google.genai as genai

api_key = userdata.get("GEMINI_API_KEY")
client = genai.Client(api_key=api_key)

⏱️ ~30 seconds

📁 Load Dataset
# Cell 5: Upload and analyze
df = upload_dataset()
dataset_summary = generate_dataset_summary(df)

⏱️ Variable (depends on file size)

🧹 AI-Powered Cleaning
# Cell 6-7: Automated cleaning
cleaning_prompt = build_cleaning_prompt(dataset_summary)
cleaning_output = ask_gemini_cleaning(cleaning_prompt)

⏱️ ~1 minute

📊 Generate Visualizations
# Cell 10-14: Create charts
viz_code = prompt_gemini(viz_prompt)
exec(viz_code)

# Cell 15-17: Build dashboard
dashboard_code = prompt_gemini(dash_prompt)
exec(dashboard_code)

⏱️ ~2 minutes

divider

🎨 Features

🔍 Automated Data Analysis

  • Comprehensive Summary: Statistical metrics, missing values, data type profiling
  • AI Quality Assessment: Gemini-powered evaluation
  • Column-wise Analysis: Detailed numeric and categorical insights

🧹 Smart Data Cleaning

Feature Description Status
Missing Value Detection Automatic identification and handling
Outlier Management 99th percentile statistical capping
Data Normalization Column standardization and value scaling
Categorical Encoding One-hot encoding for ML readiness
Negative Value Handling Automatic conversion to absolute values

📊 Visualization Suite

📊 Chart Types

• Histograms
• Bar charts
• Line charts
• Scatter plots
• Box plots
• Heatmaps
• Pie charts
• Correlation matrices
• Violin plots
• Area charts

🎛️ Interactive Features

• Real-time filtering
• KPI cards
• Multi-select widgets
• Auto-updating charts
• Dynamic interactions
• Responsive design

✨ Professional Quality

• Custom styling
• Proper titles
• Axis labels
• Legends
• Color schemes
• Export-ready

🤖 AI Integration

🧠 Gemini 2.5 Pro

Advanced data analysis and cleaning recommendations

🔴 Thorough & Comprehensive

⚡ Gemini 2.5 Flash

Fast visualization code generation

🟢 Quick & Efficient

divider

🔧 Technical Stack

Core Dependencies

📊 Data Processing

pandas      # Data manipulation
numpy       # Numerical operations

📈 Visualization

matplotlib  # Static plots
seaborn     # Statistical graphics
plotly      # Interactive charts

🤖 Machine Learning

scikit-learn  # Preprocessing & encoding

🧠 AI Integration

google-generativeai  # Gemini API

🎛️ Interactive Components

ipywidgets  # Dashboard widgets

✅ Validation

jsonschema  # Data validation

Notebook Structure

📓 Cells 1-2:   Environment Setup (Dependencies)
🤖 Cells 3-4:   AI Initialization (Gemini Config)
📁 Cell 5:      Data Loading (Upload & Profile)
🧹 Cells 6-9:   AI Cleaning (Quality Improvement)
📊 Cells 10-14: Visualization (Chart Generation)
📈 Cells 15-17: Dashboard (Interactive Interface)
📋 Cells 18-19: Reporting (Insights & Recommendations)

divider

📊 Output Deliverables

1️⃣

Cleaned Dataset

ML-ready with encoding and standardization

2️⃣

Visualizations

10+ professional charts

3️⃣

Interactive Dashboard

Real-time analytics with KPIs

4️⃣

Analysis Report

Automated insights & recommendations

divider

📈 Use Cases

💼 Business Intelligence

  • Sales analysis & forecasting
  • Performance tracking
  • KPI monitoring & dashboards
  • Revenue analysis
  • Market trend identification

🔬 Data Science

  • Automated ETL pipelines
  • Feature engineering
  • Model preparation
  • Data preprocessing
  • Exploratory data analysis

📊 Research Analytics

  • Statistical analysis
  • Correlation studies
  • Pattern recognition
  • Hypothesis testing
  • Trend analysis

📋 Reporting Automation

  • Automated report generation
  • Executive dashboards
  • Periodic reporting
  • Stakeholder presentations
  • Business intelligence insights

divider

🛡️ Security

╔═══════════════════════════════════════════════════════════╗
║               SECURITY & PRIVACY MEASURES                 ║
╠═══════════════════════════════════════════════════════════╣
║  ✅  Secure API Handling                                  ║
║      → API keys stored in Colab secrets                   ║
║                                                           ║
║  ✅  No Hardcoded Credentials                             ║
║      → Secure authentication practices                    ║
║                                                           ║
║  ✅  Data Privacy                                         ║
║      → Local processing without external transmission     ║
╚═══════════════════════════════════════════════════════════╝

divider

📚 Project Structure

DataLens-AI-Intelligent-Data-Analytics-Agent/
│
└── 📊 DataLens AI - Intelligent Data Analytics Agent.ipynb
    (Version 2 - Optimized for Google Colab)

🚀 Deployment

🌐 Version 1 - Live Demo

Live Demo

Deployed on Hugging Face Spaces

📦 Version 2 - Current

Status

Available in this Repository

🚨 Important Notes

⚠️  REQUIREMENTS
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

✓ Google Colab environment recommended
✓ Gemini API key configured in Colab secrets
✓ Supports CSV and Excel file formats
✓ Automatic dependency installation

divider


🎓 Capstone Project

Google's 5-Day AI Agents Intensive Course


Video Demo Live Demo

Built Using

Gemini Python Stack


Transform your data into insights with AI ✨


Made by Adinath Somnath Jagtap & Prajwal Ashok Zolage


divider