A smart Exploratory Data Analysis (EDA) assistant that automates data profiling, visualization, and summarization using Python and AI-powered tools.
This project is designed to assist data scientists and analysts by automating the most common EDA tasks. With just a few lines of code, the assistant can generate data summaries, detect nulls, display feature distributions, correlations, and offer insights using natural language.
- Python
- Pandas / NumPy β For data manipulation and statistical calculations
- Matplotlib / Seaborn β For plotting and visualizations
- Streamlit β To build an interactive web interface
- YData Profiling β For automatic EDA reports
- scikit-learn β For preprocessing and basic ML utilities
-
app.py
The Streamlit application that powers the assistant. -
eda_tools.py
Custom Python script for handling EDA logic and feature generation. -
requirements.txt
Python dependencies required to run the application:streamlit pandas numpy matplotlib seaborn ydata-profiling scikit-learn -
sample_datasets/
Includes example datasets you can use to test the tool.
- Upload CSV files and get:
- Dataset overview (shape, dtypes, missing values)
- Descriptive statistics
- Correlation matrix and heatmap
- Class balance checks (for classification tasks)
- Automated report using YData Profiling
- Simple, interactive controls using Streamlit sidebar
-
Clone the repository:
git clone https://github.com/sjapanjots/AI_EDA_Assistant.git cd AI_EDA_Assistant -
Install dependencies:
pip install -r requirements.txt
-
Run the Streamlit app:
streamlit run app.py
-
Go to
http://localhost:8501in your browser.
- Upload:
iris.csv - Output:
- Data summary
- Target distribution
- Feature correlation heatmap
- Automated profiling report (HTML)
Japanjot Singh
Data Scientist & ML Enthusiast
π¬ sjapanjots@gmail.com