Skip to content

PRITAM-TU/AI-Data-Cleaner-Analyzer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

AI Data Cleaner & Analyzer πŸ€–πŸ“Š

A powerful, intelligent web application that automatically cleans and analyzes CSV files using AI-powered algorithms. Features missing value imputation, outlier detection, and comprehensive data analysis with a modern, user-friendly interface.

AI Data Cleaner React Node.js Python MongoDB

πŸš€ Features

πŸ”§ Data Cleaning

  • Smart Missing Value Imputation - Automatic handling using appropriate strategies
  • Multi-method Outlier Detection - Isolation Forest & Z-score algorithms
  • Automatic Data Type Detection - Intelligent column type classification
  • Comprehensive Data Validation - Robust error handling and data integrity checks

πŸ“ˆ Analysis & Reporting

  • Detailed Statistics - Mean, median, standard deviation, quartiles, and more
  • Visual Data Preview - Clean tabular display of original and cleaned data
  • Outlier Analysis - Detailed outlier indices and detection methods
  • Cleaning Audit Log - Complete history of all transformations applied

πŸ’Ύ Export & Storage

  • Download Cleaned CSV - Export processed data instantly
  • JSON Analysis Reports - Comprehensive cleaning summary
  • MongoDB Integration - Persistent storage of cleaning history
  • GridFS Support - Scalable file storage solution

πŸ›  Technology Stack

Frontend Layer

Technology Version Purpose
React 18.2+ Modern UI framework with hooks
Vite 4.4+ Fast build tool and dev server
Tailwind CSS 3.3+ Utility-first CSS framework
Axios 1.5+ HTTP client for API calls
React Hot Toast 2.4+ Beautiful notifications

Backend Layer

Technology Version Purpose
Node.js 16+ Runtime environment
Express.js 4.18+ Web application framework
Multer 1.4+ File upload middleware
Mongoose 7.4+ MongoDB object modeling
CORS 2.8+ Cross-origin resource sharing

AI/ML Service

Technology Version Purpose
Python 3.8+ Data processing runtime
FastAPI 0.104+ Modern Python web framework
Pandas 2.1+ Data manipulation and analysis
Scikit-learn 1.3+ Machine learning algorithms
NumPy 1.25+ Numerical computations

Database

Technology Version Purpose
MongoDB 4.4+ NoSQL database
Mongoose ODM 7.4+ MongoDB object modeling
GridFS - Large file storage

Development Tools

Technology Purpose
Concurrently Run multiple commands simultaneously
Nodemon Auto-restart Node.js server
UVicorn ASGI server for Python
Dotenv Environment variable management

πŸ“‹ Prerequisites

Before you begin, ensure you have the following installed:

Required Software

  • Node.js (v16 or higher) - Download
  • Python (v3.8 or higher) - Download
  • MongoDB (v4.4 or higher) - Download
  • npm or yarn package manager

Verify Installation

# Check Node.js version
node --version

# Check npm version
npm --version

# Check Python version
python --version

# Check MongoDB (ensure service is running)
mongod --version

πŸš€ Quick Start

Method 1: Automated Setup (Recommended)

  1. Clone and setup with one command:
# Clone the repository
git clone <your-repo-url>
cd data-cleaner-analyzer

# Run automated setup script
npm run setup:project

Method 2: Manual Setup

Step 1: Clone Repository

git clone <your-repo-url>
cd data-cleaner-analyzer

Step 2: Install All Dependencies

# Install root and all sub-project dependencies
npm run install:all

Step 3: Python Environment Setup

cd server/python-service

# Create virtual environment
python -m venv venv

# Activate virtual environment
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activate

# Install Python dependencies
pip install -r requirements.txt

cd ../..

Step 4: Database Setup

Option A: Local MongoDB

# Start MongoDB service (varies by OS)
# Ubuntu/Debian:
sudo systemctl start mongod
# macOS (with Homebrew):
brew services start mongodb
# Windows:
net start MongoDB

Step 5: Environment Configuration

Create server/node-server/.env file:

PORT=5000
MONGODB_URI=mongodb://localhost:27017/data_cleaner
PYTHON_SERVICE_URL=http://localhost:5001
NODE_ENV=development
UPLOAD_MAX_SIZE=10485760

🎯 Running the Application

Development Mode (All Services)

Using Concurrently (Easiest):

# Run all three services simultaneously
npm run dev:all

This command starts:

Running Services Individually

Terminal 1: Start Node.js Backend

cd server/node-server
npm run dev

βœ… Backend running on: http://localhost:5000

Terminal 2: Start Python AI Service

cd server/python-service

# Activate virtual environment
source venv/bin/activate  # or venv\Scripts\activate on Windows

# Start FastAPI server with auto-reload
python run.py

βœ… Python service running on: http://localhost:5001

Terminal 3: Start React Frontend

cd client
npm run dev

βœ… Frontend running on: http://localhost:3000

πŸ“Š Usage Guide

1. Upload Data

  • Navigate to http://localhost:3000
  • Drag & drop a CSV file or click to browse
  • Supported: All standard CSV files (< 10MB)

2. Preview Data

  • Review your dataset in the interactive table
  • Identify potential issues before processing
  • View first 10 rows for quick validation

3. Clean & Analyze

  • Click "Clean & Analyze Data" to start processing
  • AI algorithms automatically:
    • Detect and impute missing values
    • Identify outliers using multiple methods
    • Generate comprehensive statistics

4. Review Results

Summary Tab:

  • Overview of cleaning operations
  • Missing values fixed count
  • Outliers detected
  • Applied strategies per column

Statistics Tab:

  • Detailed column statistics
  • Mean, median, standard deviation
  • Min/max values and quartiles

Outliers Tab:

  • Outlier indices and detection methods
  • Isolation Forest results
  • Z-score analysis

Cleaning Log Tab:

  • Complete audit trail
  • Step-by-step transformation history
  • Timestamps and action details

5. Export Results

  • Download cleaned CSV file
  • Export detailed JSON analysis report
  • All exports include timestamps and original filename

πŸ”§ API Documentation

Node.js Backend Endpoints

Method Endpoint Description Body
POST /api/clean Upload and clean CSV multipart/form-data
GET /api/health Service health check -
GET /api/history Get cleaning history -

Example Clean Request:

curl -X POST http://localhost:5000/api/clean \
  -F "[email protected]" \
  -H "Content-Type: multipart/form-data"

Python Service Endpoints

Method Endpoint Description
POST /clean Clean CSV data (internal)
GET /health Python service health

πŸ—‚ Project Structure

data-cleaner-analyzer/
β”œβ”€β”€ πŸ“ client/                     # React Frontend
β”‚   β”œβ”€β”€ πŸ“ public/                 # Static assets
β”‚   β”œβ”€β”€ πŸ“ src/
β”‚   β”‚   β”œβ”€β”€ πŸ“ components/         # React components
β”‚   β”‚   β”‚   β”œβ”€β”€ DataCleaner.jsx    # Main orchestrator
β”‚   β”‚   β”‚   β”œβ”€β”€ FileUpload.jsx     # Drag & drop upload
β”‚   β”‚   β”‚   β”œβ”€β”€ DataPreview.jsx    # CSV preview table
β”‚   β”‚   β”‚   └── AnalysisResults.jsx # Results display
β”‚   β”‚   β”œβ”€β”€ App.jsx               # Root component
β”‚   β”‚   β”œβ”€β”€ main.jsx              # Application entry
β”‚   β”‚   └── index.css             # Tailwind styles
β”‚   β”œβ”€β”€ package.json
β”‚   β”œβ”€β”€ vite.config.js
β”‚   └── tailwind.config.js
β”œβ”€β”€ πŸ“ server/
β”‚   β”œβ”€β”€ πŸ“ node-server/           # Express Backend
β”‚   β”‚   β”œβ”€β”€ server.js             # Main server file
β”‚   β”‚   β”œβ”€β”€ package.json
β”‚   β”‚   └── .env                  # Environment variables
β”‚   └── πŸ“ python-service/        # AI Cleaning Service
β”‚       β”œβ”€β”€ main.py               # FastAPI application
β”‚       β”œβ”€β”€ run.py                # Development server
β”‚       └── requirements.txt      # Python dependencies
β”œβ”€β”€ package.json                  # Root package file
└── README.md                     # This file

πŸ§ͺ Testing with Sample Data

  1. Create a sample CSV file:
age,income,department,performance
25,50000,Engineering,85
,45000,Marketing,92
35,80000,Engineering,78
28,,Sales,88
42,120000,Engineering,65
29,55000,Marketing,95
,75000,Sales,91
31,60000,Engineering,87
26,48000,Marketing,89
45,110000,Engineering,72
  1. Upload and test the cleaning process
  2. Verify all features are working correctly

πŸ› Troubleshooting

Common Issues & Solutions

1. Python Service Connection Refused

# Check if Python service is running
curl http://localhost:5001/health

# Restart Python service
cd server/python-service
source venv/bin/activate
python run.py

2. MongoDB Connection Issues

# Check MongoDB status
sudo systemctl status mongod

# Start MongoDB service
sudo systemctl start mongod

# Or using Docker
docker start mongodb

3. File Upload Errors

  • Ensure file is valid CSV format
  • Check file size (< 10MB limit)
  • Verify file is not corrupted

4. Module Not Found Errors

# Reinstall dependencies
npm run install:all

# Clear npm cache
npm cache clean --force

5. Port Already in Use

# Find and kill process using port
lsof -ti:3000 | xargs kill -9  # React
lsof -ti:5000 | xargs kill -9  # Node.js
lsof -ti:5001 | xargs kill -9  # Python

πŸ“ Available Scripts

Root Level Scripts

npm run dev:all          # Start all services
npm run install:all      # Install all dependencies
npm run setup:project    # Complete setup automation

Client Scripts

npm run dev              # Start React dev server
npm run build            # Build for production
npm run preview          # Preview production build

Server Scripts

npm run dev              # Start Node.js with nodemon
npm start               # Start Node.js in production

Python Service

python run.py           # Start FastAPI with auto-reload

🌐 Production Deployment

Build for Production

# Build React frontend
cd client
npm run build

# The build output will be in client/dist/

Environment Variables for Production

NODE_ENV=production
MONGODB_URI=your_production_mongodb_uri
PYTHON_SERVICE_URL=your_python_service_url
PORT=5000

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/amazing-feature
  3. Commit changes: git commit -m 'Add amazing feature'
  4. Push to branch: git push origin feature/amazing-feature
  5. Open a Pull Request

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • Pandas - For powerful data manipulation capabilities
  • Scikit-learn - For robust machine learning algorithms
  • FastAPI - For high-performance Python web framework
  • React Team - For the amazing frontend library
  • Tailwind CSS - For the utility-first CSS framework

Ready to clean your data? πŸš€ Start by running npm run dev:all

For questions or support, please open an issue in the repository.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published