A powerful, intelligent web application that automatically cleans and analyzes CSV files using AI-powered algorithms. Features missing value imputation, outlier detection, and comprehensive data analysis with a modern, user-friendly interface.
- Smart Missing Value Imputation - Automatic handling using appropriate strategies
- Multi-method Outlier Detection - Isolation Forest & Z-score algorithms
- Automatic Data Type Detection - Intelligent column type classification
- Comprehensive Data Validation - Robust error handling and data integrity checks
- Detailed Statistics - Mean, median, standard deviation, quartiles, and more
- Visual Data Preview - Clean tabular display of original and cleaned data
- Outlier Analysis - Detailed outlier indices and detection methods
- Cleaning Audit Log - Complete history of all transformations applied
- Download Cleaned CSV - Export processed data instantly
- JSON Analysis Reports - Comprehensive cleaning summary
- MongoDB Integration - Persistent storage of cleaning history
- GridFS Support - Scalable file storage solution
| Technology | Version | Purpose |
|---|---|---|
| React | 18.2+ | Modern UI framework with hooks |
| Vite | 4.4+ | Fast build tool and dev server |
| Tailwind CSS | 3.3+ | Utility-first CSS framework |
| Axios | 1.5+ | HTTP client for API calls |
| React Hot Toast | 2.4+ | Beautiful notifications |
| Technology | Version | Purpose |
|---|---|---|
| Node.js | 16+ | Runtime environment |
| Express.js | 4.18+ | Web application framework |
| Multer | 1.4+ | File upload middleware |
| Mongoose | 7.4+ | MongoDB object modeling |
| CORS | 2.8+ | Cross-origin resource sharing |
| Technology | Version | Purpose |
|---|---|---|
| Python | 3.8+ | Data processing runtime |
| FastAPI | 0.104+ | Modern Python web framework |
| Pandas | 2.1+ | Data manipulation and analysis |
| Scikit-learn | 1.3+ | Machine learning algorithms |
| NumPy | 1.25+ | Numerical computations |
| Technology | Version | Purpose |
|---|---|---|
| MongoDB | 4.4+ | NoSQL database |
| Mongoose ODM | 7.4+ | MongoDB object modeling |
| GridFS | - | Large file storage |
| Technology | Purpose |
|---|---|
| Concurrently | Run multiple commands simultaneously |
| Nodemon | Auto-restart Node.js server |
| UVicorn | ASGI server for Python |
| Dotenv | Environment variable management |
Before you begin, ensure you have the following installed:
- Node.js (v16 or higher) - Download
- Python (v3.8 or higher) - Download
- MongoDB (v4.4 or higher) - Download
- npm or yarn package manager
# Check Node.js version
node --version
# Check npm version
npm --version
# Check Python version
python --version
# Check MongoDB (ensure service is running)
mongod --version- Clone and setup with one command:
# Clone the repository
git clone <your-repo-url>
cd data-cleaner-analyzer
# Run automated setup script
npm run setup:projectgit clone <your-repo-url>
cd data-cleaner-analyzer# Install root and all sub-project dependencies
npm run install:allcd server/python-service
# Create virtual environment
python -m venv venv
# Activate virtual environment
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activate
# Install Python dependencies
pip install -r requirements.txt
cd ../..Option A: Local MongoDB
# Start MongoDB service (varies by OS)
# Ubuntu/Debian:
sudo systemctl start mongod
# macOS (with Homebrew):
brew services start mongodb
# Windows:
net start MongoDBCreate server/node-server/.env file:
PORT=5000
MONGODB_URI=mongodb://localhost:27017/data_cleaner
PYTHON_SERVICE_URL=http://localhost:5001
NODE_ENV=development
UPLOAD_MAX_SIZE=10485760Using Concurrently (Easiest):
# Run all three services simultaneously
npm run dev:allThis command starts:
- React frontend on http://localhost:3000
- Node.js backend on http://localhost:5000
- Python service on http://localhost:5001
cd server/node-server
npm run devβ Backend running on: http://localhost:5000
cd server/python-service
# Activate virtual environment
source venv/bin/activate # or venv\Scripts\activate on Windows
# Start FastAPI server with auto-reload
python run.pyβ Python service running on: http://localhost:5001
cd client
npm run devβ Frontend running on: http://localhost:3000
- Navigate to http://localhost:3000
- Drag & drop a CSV file or click to browse
- Supported: All standard CSV files (< 10MB)
- Review your dataset in the interactive table
- Identify potential issues before processing
- View first 10 rows for quick validation
- Click "Clean & Analyze Data" to start processing
- AI algorithms automatically:
- Detect and impute missing values
- Identify outliers using multiple methods
- Generate comprehensive statistics
Summary Tab:
- Overview of cleaning operations
- Missing values fixed count
- Outliers detected
- Applied strategies per column
Statistics Tab:
- Detailed column statistics
- Mean, median, standard deviation
- Min/max values and quartiles
Outliers Tab:
- Outlier indices and detection methods
- Isolation Forest results
- Z-score analysis
Cleaning Log Tab:
- Complete audit trail
- Step-by-step transformation history
- Timestamps and action details
- Download cleaned CSV file
- Export detailed JSON analysis report
- All exports include timestamps and original filename
| Method | Endpoint | Description | Body |
|---|---|---|---|
POST |
/api/clean |
Upload and clean CSV | multipart/form-data |
GET |
/api/health |
Service health check | - |
GET |
/api/history |
Get cleaning history | - |
Example Clean Request:
curl -X POST http://localhost:5000/api/clean \
-F "[email protected]" \
-H "Content-Type: multipart/form-data"| Method | Endpoint | Description |
|---|---|---|
POST |
/clean |
Clean CSV data (internal) |
GET |
/health |
Python service health |
data-cleaner-analyzer/
βββ π client/ # React Frontend
β βββ π public/ # Static assets
β βββ π src/
β β βββ π components/ # React components
β β β βββ DataCleaner.jsx # Main orchestrator
β β β βββ FileUpload.jsx # Drag & drop upload
β β β βββ DataPreview.jsx # CSV preview table
β β β βββ AnalysisResults.jsx # Results display
β β βββ App.jsx # Root component
β β βββ main.jsx # Application entry
β β βββ index.css # Tailwind styles
β βββ package.json
β βββ vite.config.js
β βββ tailwind.config.js
βββ π server/
β βββ π node-server/ # Express Backend
β β βββ server.js # Main server file
β β βββ package.json
β β βββ .env # Environment variables
β βββ π python-service/ # AI Cleaning Service
β βββ main.py # FastAPI application
β βββ run.py # Development server
β βββ requirements.txt # Python dependencies
βββ package.json # Root package file
βββ README.md # This file
- Create a sample CSV file:
age,income,department,performance
25,50000,Engineering,85
,45000,Marketing,92
35,80000,Engineering,78
28,,Sales,88
42,120000,Engineering,65
29,55000,Marketing,95
,75000,Sales,91
31,60000,Engineering,87
26,48000,Marketing,89
45,110000,Engineering,72
- Upload and test the cleaning process
- Verify all features are working correctly
1. Python Service Connection Refused
# Check if Python service is running
curl http://localhost:5001/health
# Restart Python service
cd server/python-service
source venv/bin/activate
python run.py2. MongoDB Connection Issues
# Check MongoDB status
sudo systemctl status mongod
# Start MongoDB service
sudo systemctl start mongod
# Or using Docker
docker start mongodb3. File Upload Errors
- Ensure file is valid CSV format
- Check file size (< 10MB limit)
- Verify file is not corrupted
4. Module Not Found Errors
# Reinstall dependencies
npm run install:all
# Clear npm cache
npm cache clean --force5. Port Already in Use
# Find and kill process using port
lsof -ti:3000 | xargs kill -9 # React
lsof -ti:5000 | xargs kill -9 # Node.js
lsof -ti:5001 | xargs kill -9 # Pythonnpm run dev:all # Start all services
npm run install:all # Install all dependencies
npm run setup:project # Complete setup automationnpm run dev # Start React dev server
npm run build # Build for production
npm run preview # Preview production buildnpm run dev # Start Node.js with nodemon
npm start # Start Node.js in productionpython run.py # Start FastAPI with auto-reload# Build React frontend
cd client
npm run build
# The build output will be in client/dist/NODE_ENV=production
MONGODB_URI=your_production_mongodb_uri
PYTHON_SERVICE_URL=your_python_service_url
PORT=5000- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Commit changes:
git commit -m 'Add amazing feature' - Push to branch:
git push origin feature/amazing-feature - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Pandas - For powerful data manipulation capabilities
- Scikit-learn - For robust machine learning algorithms
- FastAPI - For high-performance Python web framework
- React Team - For the amazing frontend library
- Tailwind CSS - For the utility-first CSS framework
Ready to clean your data? π Start by running npm run dev:all
For questions or support, please open an issue in the repository.