In this project, I cleaned and transformed a real-world layoffs dataset using MySQL. The raw dataset contained duplicate records, inconsistent text formatting, null values, and incorrect date formats.
The goal was to prepare the dataset for analysis by making it clean, structured, and ready for further use in dashboards or data analysis.
- MySQL Workbench
- MySQL 8+
- SQL (CTE, Window Functions, Joins)
- CSV Dataset
The dataset includes the following columns:
- Company
- Location
- Industry
- Total Laid Off
- Percentage Laid Off
- Date
- Stage
- Country
- Funds Raised (Millions)
The original raw dataset is stored inside the /data folder.
-
Created staging tables to avoid modifying the original dataset.
-
Removed duplicate records using the ROW_NUMBER() window function and deleted rows where row_num > 1.
-
Standardized text data:
- Trimmed extra spaces from company names
- Standardized industry values (e.g., "Crypto%" to "Crypto")
- Cleaned country names (removed trailing periods like "United States.")
-
Converted the date column from text format to proper DATE format using STR_TO_DATE().
-
Handled null and blank values:
- Updated missing industry values using self joins
- Removed rows where both total_laid_off and percentage_laid_off were NULL
-
Dropped helper columns used during cleaning.
- Common Table Expressions (CTE)
- Window Functions (ROW_NUMBER)
- Self Joins
- String Functions (TRIM, LIKE)
- Date Conversion (STR_TO_DATE)
- NULL Handling
After cleaning, the dataset is:
- Free from duplicate records
- Properly formatted
- Standardized
- Ready for data analysis or visualization
Your Name Aspiring Data Analyst | SQL | Data Cleaning | Analytics