OmniCluster: Universal Data Analysis Studio 🚀

OmniCluster is an advanced, universal Data Analysis & Clustering Studio built with Streamlit. Unlike traditional tools limited to specific domains, OmniCluster leverages powerful Machine Learning algorithms (K-Means, DBSCAN, PCA) to analyze ANY numeric dataset—from biological gene data to financial stocks and sports statistics.

Key Features

📊 Data Exploration (Enhanced)

Descriptive Statistics: View full statistical summary (Mean, Std, Min, Max).
Missing Value Analysis: Automatically checks for and visualizes null values.
Interactive Distribution Plots: Visualize histograms for every feature.
Outlier Detection: Interactive Box Plots to spot anomalies.
PCA Visualization: Automatically project high-dimensional data (many columns) into 3D space.

🧠 Advanced Clustering

Multiple Algorithms:
- K-Means: Standard centroid-based clustering.
- DBSCAN: Density-based clustering for finding arbitrary shapes and outliers.
- Hierarchical: Visualize data relationships with Dendrograms.
Optimal K Analysis: Scientifically determine the best number of clusters using Elbow Method and Silhouette Score.
Dynamic Feature Selection: Choose any combination of columns for analysis.

📈 Interactive Visualizations

3D & 2D Scatter Plots: Powered by Plotly. Zoom, pan, and rotate to explore customer groups.
Radar Charts: Visualize the "personality" of each cluster (e.g., "High Income vs High Spending").

🌐 Smart Data Connectivity (New!)

True Web Search: Search the real internet (via DuckDuckGo) for any CSV dataset (e.g. "Pokemon", "Bitcoin", "UFC").
Mega Dataset Library: Built-in access to 150+ curated datasets across categories like Finance, Healthcare, Sports, and Tech.
Hybrid Search Engine: Features a "Google-like" autocomplete that instantly finds datasets in the library or falls back to web scraping.
Robust Auto-Cleaner: Automatically detects CSV delimiters and forces "string-numbers" (e.g. "$1,000") into usable numeric formats.

🤖 AI-Powered Analysis (Gemini Integration)

Virtual Data Scientist: Integrated with Google Gemini 2.5 Flash.
Explain Dataset: One-click AI analysis of what your dataset represents and what trends to look for.
Interpret Clusters: AI automatically analyzes cluster centers to assign creative, human-readable personas (e.g. "The Power Users", "The Risky Borrowers").
Interactive AI Chat: Chat with your data! Ask questions like "What is the trend?" and get answers based on the actual dataset statistics.
Smart Context: Chat history automatically resets when you load a new dataset, ensuring the AI always talks about the current data.
Secure: API Keys are safely stored in .streamlit/secrets.toml and never exposed in code.

🔮 Prediction & Production

Real-time Inference: Enter details for a new data point and instantly predict its segment.
Universal Auto-Insights: Automatically generates statistical profiles (e.g. "High Glucose, Low Age") for any dataset.
Model Export: Download your trained K-Means model (.pkl) for production use.
Data Export: Download the fully clustered dataset as a CSV.

🛠️ Installation

Local Setup

Clone the repository.
Install dependencies:
```
pip install -r requirements.txt
```
Run the app:
```
streamlit run app.py
```

Docker Setup 🐳

You can also run this app in a Docker container.

Build the image:
```
docker build -t customer-segmentation .
```
Run the container:
```
docker run -p 8501:8501 customer-segmentation
```
Access the app at http://localhost:8501.

📂 Project Structure

app.py: Main Streamlit application (UI).
logic.py: Logic layer for Data Processing, Clustering, and PCA (Backend).
datasets.json: Extended library of curated CSV links.
Dockerfile: Configuration for container deployment.
requirements.txt: Python dependencies.

👤 Author

Vedant Dhoke

Github: @vedant713

🤝 Show your support

Give a ⭐️ if this project helped you!

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
__pycache__		__pycache__
.DS_Store		.DS_Store
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
app.py		app.py
datasets.json		datasets.json
debug_models.py		debug_models.py
debug_search.py		debug_search.py
logic.py		logic.py
requirements.txt		requirements.txt
sample_customer_data.csv		sample_customer_data.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OmniCluster: Universal Data Analysis Studio 🚀

Key Features

📊 Data Exploration (Enhanced)

🧠 Advanced Clustering

📈 Interactive Visualizations

🌐 Smart Data Connectivity (New!)

🤖 AI-Powered Analysis (Gemini Integration)

🔮 Prediction & Production

🛠️ Installation

Local Setup

Docker Setup 🐳

📂 Project Structure

👤 Author

🤝 Show your support

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

OmniCluster: Universal Data Analysis Studio 🚀

Key Features

📊 Data Exploration (Enhanced)

🧠 Advanced Clustering

📈 Interactive Visualizations

🌐 Smart Data Connectivity (New!)

🤖 AI-Powered Analysis (Gemini Integration)

🔮 Prediction & Production

🛠️ Installation

Local Setup

Docker Setup 🐳

📂 Project Structure

👤 Author

🤝 Show your support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages