HR Analytics | IBM HR Dataset
This project analyzes employee attrition patterns to identify key factors associated with employees leaving an organization. The analysis is performed using Python for exploratory data analysis and machine learning, and insights are presented through an interactive Power BI dashboard.
The focus of the project is to derive clear, data-driven insights that can help HR teams understand attrition drivers and support informed workforce decisions.
- Python: Pandas, NumPy, Matplotlib, Seaborn, Scikit-learn
- Jupyter Notebook: Data analysis & modeling
- Power BI: Interactive dashboards & KPIs
- GitHub: Version control & project sharing
- Source: IBM HR Analytics Employee Attrition Dataset (Kaggle)
- Records: 1,470 employees
- Target Variable: Attrition (Yes / No)
- Removed irrelevant columns
- Created attrition flag and meaningful buckets (age, salary, experience, tenure)
- Demographic analysis (Age, Gender, Marital Status, Education)
- Compensation analysis (Monthly Income, Salary Buckets)
- Job & role analysis (Department, Job Role, Job Level)
- Experience & engagement analysis (Experience, Years at Company, Satisfaction, Work-Life Balance)
- Work conditions (Overtime, Business Travel, Distance from Home)
- Bucket creation for continuous variables
- Encoding categorical variables
- Logistic Regression
- Random Forest Classifier
- Model evaluation using Accuracy, ROC-AUC, Precision & Recall
- 3-page Power BI dashboard with slicers and KPIs
- Attrition is highest among younger employees and freshers
- Very low salary ranges show significantly higher attrition
- Employees working overtime have much higher attrition risk
- Entry-level job roles and lower job levels experience more attrition
- Higher job involvement, satisfaction, and work-life balance reduce attrition
- Employees with frequent business travel are more likely to leave
Exploratory machine learning models were built to understand the feasibility of predicting employee attrition based on historical HR data.
- Accuracy: ~77%
- ROC-AUC: ~0.80
- Demonstrated strong recall for attrition cases, making it useful for identifying employees at higher risk of leaving.
- Accuracy: ~85%
- Highlighted important drivers such as Monthly Income, Age, Experience, Overtime, and Satisfaction levels.
The Power BI dashboard consists of three professional pages:
- Total Employees
- Employees Left vs Stayed
- Attrition Rate (%)
- Attrition by Age Group
- Attrition by Salary Range
- Attrition by Overtime
- Attrition by Department
- Attrition by Job Role
- Attrition by Job Level
- Attrition by Gender & Marital Status
- Attrition by Experience Bucket
- Attrition by Years at Company
- Attrition by Job Satisfaction
- Attrition by Job Involvement
- Attrition by Work-Life Balance
Employee-Attrition-EDA-ML-PowerBI/
├── Data/ # Raw & cleaned datasets
├── notebooks/ # Jupyter notebook (EDA + ML)
├── power_bi/ # Power BI dashboard (.pbix)
├── Screenshots/ # Dashboard Screenshots
└── README.md
The insights from this project can help HR teams:
- Identify employee segments with higher attrition risk
- Understand the impact of compensation, workload, and engagement factors
- Support data-informed retention and workforce planning decisions
Gaurav Singh
Data Analyst | Python | SQL | Power BI | Machine Learning
LinkedIn: https://www.linkedin.com/in/gaurav-singh-604492340/
Email: gaurav510610@gmail.com


