A Flask web app for automated Exploratory Data Analysis (EDA) and quick ML model comparison, with secure server-rendered previews and exports.
- Features
- Quick Start
- Installation
- Usage
- Project Structure
- API
- Supported Algorithms
- Configuration
- Troubleshooting
- Contributing
- License
- Changelog
- Automated EDA: dataset size, dtypes, missingness, duplicates, memory usage
- Smart target detection and problem-type inference (binary/multiclass vs regression)
- One-click training and comparison across multiple sklearn models
- Feature importance visualization and downloadable model package (model + preprocessors)
- CSV export of model comparison results
- Secure, server-generated HTML for data preview to minimize XSS attack surface
# 1) Clone the repository (replace with your repo URL)
git clone https://github.com/BoddapuLokesh/ML-Algorithm-Comparison.git
cd ML-Algorithm-Comparison
# 2) Create and activate a virtual environment
python3 -m venv .venv
source .venv/bin/activate
# 3) Install dependencies
python -m pip install --upgrade pip
pip install -r requirements.txt
# 4) Run the application
python app.py
# App will start at http://127.0.0.1:5002/If you already have the folder locally, start from step 2 inside the project directory.
# 1) Clone the repository (replace with your repo URL)
git clone https://github.com/BoddapuLokesh/ML-Algorithm-Comparison.git
cd ML-Algorithm-Comparison
# 2) Create and activate a virtual environment
python -m venv .venv
.\.venv\Scripts\Activate.ps1
# 3) Install dependencies
python -m pip install --upgrade pip
pip install -r requirements.txt
# 4) Run the application
python app.py
# App will start at http://127.0.0.1:5002/Prerequisites:
- Python 3.8+
- pip
Steps:
- Clone the repository
- Create and activate a virtual environment
- Install dependencies:
pip install -r requirements.txt - Run the app:
python app.py
Optional verification:
python -c "import flask, pandas, sklearn; print('Deps OK')"- Upload a CSV/XLSX/XLS (max 50MB) to see a safe, server-generated preview
- Click Analyze to compute EDA (stats, missingness, correlations)
- Choose the target and confirm problem type (auto-detected; you can override)
- Start training to compare models; the best model is selected automatically
- Review metrics and feature importance; export CSV or download the model package
ML-Algorithm-Comparison/
├── app.py # Flask routes: upload, EDA, training, exports
├── app_helpers.py # JSON envelope, guards, upload/EDA/train handlers
├── model_utils.py # Back-compat facade to ml_utils/*
├── ml_utils/
│ ├── config.py # MLConfig, typed results, preview HTML
│ ├── eda.py # Minimal+enhanced EDA
│ ├── models.py # AutoMLComparer (fit, score, select, importance)
│ ├── preprocessing.py # ColumnTransformer pipelines + fallback
│ └── utils.py # JSON safety, validations, detection
├── templates/ # Jinja templates (layout/index)
├── static/ # style.css, app.js
└── requirements.txt # Python dependencies
File upload and EDA
- POST
/— Upload dataset (AJAX) - POST
/process_eda— Run EDA and return stats/auto target/type - GET
/eda— EDA JSON (server-side cached) - GET
/data_preview— Preview JSON - GET
/get_data_preview_html— Secure HTML preview
Training and results
- POST
/validate_training_config— Validate target/type/split - POST
/train— Train, compare, and return metrics/results/importance - GET
/metrics— Best model metrics - GET
/best_model— Best model name + metrics - GET
/model_comparison— All trained models and metrics - GET
/feature_importance— Feature importance data
Utilities
- POST
/analyze_target— Inspect a chosen target column - POST
/calculate_split— Convert split ratio to percentages - GET
/download_model— ZIP: model.joblib + preprocessors.joblib + README - GET
/export_results— CSV export of all models - GET
/debug_session— Inspect session keys (debug) - POST
/reset_session— Clear session/caches (debug)
Notes
- JSON shapes vary by endpoint; on errors you’ll receive
{ "success": false, "error": "..." }.
Classification: Logistic Regression, Random Forest, Gradient Boosting, SVC, Decision Tree
Regression: Linear Regression, Random Forest, Gradient Boosting, SVR, Decision Tree
Metrics
- Classification: Accuracy, Precision, Recall, F1, Training Time
- Regression: R², MSE, Training Time
Environment (optional)
export FLASK_ENV=development
export FLASK_DEBUG=1Runtime settings
- Max upload size: 50MB
- Session lifetime: 1 hour
- Model timeout (default): 300s per model (see
MLConfig)
Customization
- Adjust preprocessing or defaults in
ml_utils/*.py
.xlsfiles requirexlrd==1.2.0(installed via requirements.txt)- Very large/wide datasets: correlations are capped to reduce memory use
- If a model hits the time budget, it’s skipped; consider sampling or simpler models
Small PRs are welcome. Please open an issue first if the change is substantial.
MIT
August 2025
- Server-side preview HTML and consolidated Python validations
- AutoMLComparer pipelines and improved EDA (memory usage, quality score)
- Model export (model + preprocessors) and CSV results export