Title: Data Cleaning and Transformation using Power Query...
Project Type: Data Cleaning & Transformation
Duration: Jun 2025 – Jun 2025
Skills Used: Microsoft Advanced Excel | Power Query
- 📖 Project Overview
- 🛠 Problem Statement
- 🔍 Data Issues Identified
- 🛠 Step-by-Step Data Cleaning Process
- 🛠️ Tools & Technologies Used
- 📁 Files Included in This Repository
- 💡 Key Learning Outcomes
- 🚀 How to Use This Project
- 🌟 Future Enhancements
- 📬 Connect with Me
- 📷 Project Snapshot
This project showcases a real-world data cleaning and transformation process using Microsoft Power Query and Advanced Excel.
The primary objective was to convert raw, unstructured data into an analysis-ready format, ensuring accuracy, clarity, and consistency across all fields.
The dataset had multiple data quality issues, including:
- Duplicate columns
- Inconsistent casing (Uppercase, lowercase, mixed)
- Merged columns containing multiple values
- Numerous null or blank values
- Fragmented city names across columns
- Improper data type assignments
By leveraging Power Query’s robust features, I systematically transformed this messy dataset into a clean and structured one — ready for accurate analysis and reporting.
Modern business decisions rely on clean, accurate, and consistent data.
However, raw datasets from multiple sources often contain errors, inconsistencies, and structural problems.
If left unaddressed, these issues can lead to:
- Wrong analysis results
- Poor decision-making
- Increased time in manual corrections
This project aimed to automate and streamline the cleaning process using Power Query — reducing manual effort while improving efficiency.
- Multiple duplicate and unnecessary columns.
- Inconsistent text formatting, making it difficult to group or sort values.
- Merged cells with unrelated information combined in a single field.
- Blank and null values across important columns.
- City names split incorrectly into multiple columns.
- Columns placed in random, non-logical order.
- Incorrect data types (e.g., text stored as numbers or vice versa).
- Converted the dataset into an Excel table for structured processing.
- Imported the table into Power Query Editor.
- Applied Capitalize Each Word transformation to improve readability and uniformity.
- Used Split Column by Delimiter to separate merged values.
- Deleted blank or duplicate columns to simplify the dataset.
- Combined separated city name columns (e.g.,
"New"
+"York"
→"New York"
).
- Filled blank cells with
"N/A"
or other relevant placeholders to avoid null-related issues in analysis.
- Extracted values from complex columns without manual splitting.
- Arranged fields in a logical order (e.g., IDs → Names → Contact → Location → Metrics).
- Used Detect Data Type to assign correct formats (Text, Number, Date).
- Converted Working ID and Email fields to lowercase for uniformity.
- Loaded the transformed data back into Excel for further analysis.
- Microsoft Excel (Advanced) — For table creation and exporting.
- Power Query — For automating data cleaning and transformation.
- Data Profiling Tools — For detecting nulls, duplicates, and inconsistent formats.
File Name | Description |
---|---|
Cleaned & Refined Messy Rows Dataset.pdf |
Detailed explanation of the cleaning and transformation process. |
Messy Employee Dataset 2000.csv |
Raw messy dataset in CSV format. |
Messy Employee Dataset 2000.xlsx |
Raw dataset in Excel format. |
Data Cleaning.png |
Snapshot showing the cleaned dataset in Excel. |
- Practical application of Power Query for large dataset cleaning.
- Experience in text standardization, null handling, and data type management.
- Skills in automating repetitive tasks to improve data quality.
- Ability to convert messy, unstructured data into a clean, analysis-ready format.
- Download or Clone the repository.
- Open the
.csv
or.xlsx
file to explore the raw dataset. - Review the PDF documentation to understand each transformation step.
- Open the cleaned Excel file to view the final, analysis-ready dataset.
- Automating the process entirely using Power BI Dataflows.
- Adding custom validation rules for incoming raw data.
- Building a Power BI dashboard to visualize before-and-after comparisons.
- 📧 Email: [email protected]
- 💻 GitHub: github.com/rajeevgit8055hub
- 🔗 LinkedIn: linkedin.com/in/rajeev-tiwari123
- 🌐 Website: rajeevgit8055hub.github.io/rajeevtiwari.github.io
🤝 Thanks for visiting my profile!