Skip to content

vedant713/CodeClauseInternship_OmniCluster

Repository files navigation

OmniCluster: Universal Data Analysis Studio 🚀

License Python Streamlit Docker

OmniCluster is an advanced, universal Data Analysis & Clustering Studio built with Streamlit. Unlike traditional tools limited to specific domains, OmniCluster leverages powerful Machine Learning algorithms (K-Means, DBSCAN, PCA) to analyze ANY numeric dataset—from biological gene data to financial stocks and sports statistics.

Key Features

📊 Data Exploration (Enhanced)

  • Descriptive Statistics: View full statistical summary (Mean, Std, Min, Max).
  • Missing Value Analysis: Automatically checks for and visualizes null values.
  • Interactive Distribution Plots: Visualize histograms for every feature.
  • Outlier Detection: Interactive Box Plots to spot anomalies.
  • PCA Visualization: Automatically project high-dimensional data (many columns) into 3D space.

🧠 Advanced Clustering

  • Multiple Algorithms:
    • K-Means: Standard centroid-based clustering.
    • DBSCAN: Density-based clustering for finding arbitrary shapes and outliers.
    • Hierarchical: Visualize data relationships with Dendrograms.
  • Optimal K Analysis: Scientifically determine the best number of clusters using Elbow Method and Silhouette Score.
  • Dynamic Feature Selection: Choose any combination of columns for analysis.

📈 Interactive Visualizations

  • 3D & 2D Scatter Plots: Powered by Plotly. Zoom, pan, and rotate to explore customer groups.
  • Radar Charts: Visualize the "personality" of each cluster (e.g., "High Income vs High Spending").

🌐 Smart Data Connectivity (New!)

  • True Web Search: Search the real internet (via DuckDuckGo) for any CSV dataset (e.g. "Pokemon", "Bitcoin", "UFC").
  • Mega Dataset Library: Built-in access to 150+ curated datasets across categories like Finance, Healthcare, Sports, and Tech.
  • Hybrid Search Engine: Features a "Google-like" autocomplete that instantly finds datasets in the library or falls back to web scraping.
  • Robust Auto-Cleaner: Automatically detects CSV delimiters and forces "string-numbers" (e.g. "$1,000") into usable numeric formats.

🤖 AI-Powered Analysis (Gemini Integration)

  • Virtual Data Scientist: Integrated with Google Gemini 2.5 Flash.
  • Explain Dataset: One-click AI analysis of what your dataset represents and what trends to look for.
  • Interpret Clusters: AI automatically analyzes cluster centers to assign creative, human-readable personas (e.g. "The Power Users", "The Risky Borrowers").
  • Interactive AI Chat: Chat with your data! Ask questions like "What is the trend?" and get answers based on the actual dataset statistics.
  • Smart Context: Chat history automatically resets when you load a new dataset, ensuring the AI always talks about the current data.
  • Secure: API Keys are safely stored in .streamlit/secrets.toml and never exposed in code.

🔮 Prediction & Production

  • Real-time Inference: Enter details for a new data point and instantly predict its segment.
  • Universal Auto-Insights: Automatically generates statistical profiles (e.g. "High Glucose, Low Age") for any dataset.
  • Model Export: Download your trained K-Means model (.pkl) for production use.
  • Data Export: Download the fully clustered dataset as a CSV.

🛠️ Installation

Local Setup

  1. Clone the repository.
  2. Install dependencies:
    pip install -r requirements.txt
  3. Run the app:
    streamlit run app.py

Docker Setup 🐳

You can also run this app in a Docker container.

  1. Build the image:
    docker build -t customer-segmentation .
  2. Run the container:
    docker run -p 8501:8501 customer-segmentation
    Access the app at http://localhost:8501.

📂 Project Structure

  • app.py: Main Streamlit application (UI).
  • logic.py: Logic layer for Data Processing, Clustering, and PCA (Backend).
  • datasets.json: Extended library of curated CSV links.
  • Dockerfile: Configuration for container deployment.
  • requirements.txt: Python dependencies.

👤 Author

Vedant Dhoke

🤝 Show your support

Give a ⭐️ if this project helped you!

About

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors