OmniCluster is an advanced, universal Data Analysis & Clustering Studio built with Streamlit. Unlike traditional tools limited to specific domains, OmniCluster leverages powerful Machine Learning algorithms (K-Means, DBSCAN, PCA) to analyze ANY numeric dataset—from biological gene data to financial stocks and sports statistics.
- Descriptive Statistics: View full statistical summary (Mean, Std, Min, Max).
- Missing Value Analysis: Automatically checks for and visualizes null values.
- Interactive Distribution Plots: Visualize histograms for every feature.
- Outlier Detection: Interactive Box Plots to spot anomalies.
- PCA Visualization: Automatically project high-dimensional data (many columns) into 3D space.
- Multiple Algorithms:
- K-Means: Standard centroid-based clustering.
- DBSCAN: Density-based clustering for finding arbitrary shapes and outliers.
- Hierarchical: Visualize data relationships with Dendrograms.
- Optimal K Analysis: Scientifically determine the best number of clusters using Elbow Method and Silhouette Score.
- Dynamic Feature Selection: Choose any combination of columns for analysis.
- 3D & 2D Scatter Plots: Powered by Plotly. Zoom, pan, and rotate to explore customer groups.
- Radar Charts: Visualize the "personality" of each cluster (e.g., "High Income vs High Spending").
- True Web Search: Search the real internet (via DuckDuckGo) for any CSV dataset (e.g. "Pokemon", "Bitcoin", "UFC").
- Mega Dataset Library: Built-in access to 150+ curated datasets across categories like Finance, Healthcare, Sports, and Tech.
- Hybrid Search Engine: Features a "Google-like" autocomplete that instantly finds datasets in the library or falls back to web scraping.
- Robust Auto-Cleaner: Automatically detects CSV delimiters and forces "string-numbers" (e.g. "$1,000") into usable numeric formats.
- Virtual Data Scientist: Integrated with Google Gemini 2.5 Flash.
- Explain Dataset: One-click AI analysis of what your dataset represents and what trends to look for.
- Interpret Clusters: AI automatically analyzes cluster centers to assign creative, human-readable personas (e.g. "The Power Users", "The Risky Borrowers").
- Interactive AI Chat: Chat with your data! Ask questions like "What is the trend?" and get answers based on the actual dataset statistics.
- Smart Context: Chat history automatically resets when you load a new dataset, ensuring the AI always talks about the current data.
- Secure: API Keys are safely stored in
.streamlit/secrets.tomland never exposed in code.
- Real-time Inference: Enter details for a new data point and instantly predict its segment.
- Universal Auto-Insights: Automatically generates statistical profiles (e.g. "High Glucose, Low Age") for any dataset.
- Model Export: Download your trained K-Means model (
.pkl) for production use. - Data Export: Download the fully clustered dataset as a CSV.
- Clone the repository.
- Install dependencies:
pip install -r requirements.txt
- Run the app:
streamlit run app.py
You can also run this app in a Docker container.
- Build the image:
docker build -t customer-segmentation . - Run the container:
Access the app at
docker run -p 8501:8501 customer-segmentation
http://localhost:8501.
app.py: Main Streamlit application (UI).logic.py: Logic layer for Data Processing, Clustering, and PCA (Backend).datasets.json: Extended library of curated CSV links.Dockerfile: Configuration for container deployment.requirements.txt: Python dependencies.
Vedant Dhoke
- Github: @vedant713
Give a ⭐️ if this project helped you!