Use analytics to understand the main causes of employee resignation and predict future attrition.
- Imported and explored the HR dataset (1,470 employee records with 35 variables).
- Verified there were no missing values.
- Removed non-informative columns:
EmployeeCount
,EmployeeNumber
,StandardHours
,Over18
. - Converted the target variable Attrition into binary format (
Yes = 1
,No = 0
).
- Department-wise attrition: Highest in Research & Development, lowest in Human Resources.
- Salary bands: Higher attrition in lower salary band compared to medium and high.
- Promotions: Employees with fewer or no promotions in recent years were more likely to leave.
- Tenure: Shorter-tenure employees had higher attrition.
- Overtime: Employees working overtime were more prone to attrition.
- Marital Status: Single employees showed higher attrition than married or divorced employees.
- Created Income Bands → grouped into Low, Medium, High.
- Derived a Tenure feature →
YearsAtCompany / TotalWorkingYears
. - Added a Promotion Gap → measured promotion delays for modeling.
- Split dataset into training and testing (stratified by attrition).
- Preprocessing:
- Standardized numerical variables.
- One-hot encoded categorical variables.
- Built two models:
- Logistic Regression → strong performance, identified key risk factors like Overtime, Low Salary, Job Role type.
- Decision Tree → interpretable rules, key drivers included Total Working Years, Overtime, Daily Rate, Age.
- Logistic Regression → better predictive accuracy and generalization.
- Decision Tree → slightly less accurate, but easier interpretability for HR decision-making.
Top predictors of attrition:
- Total Working Years
- Overtime
- Monthly Income
- Daily Rate
- Age
- Categorical factors → Single employees, Job Role (Research Scientist)
-
Actual dataset →
HR Employee Attrition.csv
-
Transformed dataset →
cleaned_hr_data.csv
- Encoded + scaled features
- Used for machine learning modeling
-
Original dataset →
hr_data_cleaned_original.csv
- Retained categorical + numerical values
- Used for Power BI dashboards and business-friendly reporting
- Overall Workforce & Attrition: Majority are active, but attrition ~16%.
- Department-wise: R&D and Sales show highest attrition counts; HR shows proportionally high attrition despite smaller size.
- Income & Attrition: Lower-income employees leave most often; high attrition also seen among high-income Sales Executives.
- Job Role Trends: Laboratory Technicians, Research Scientists, Sales Reps → higher attrition at lower incomes; Sales Executives attrition across income levels.
- Career Stage: Younger employees most prone to leave (especially in Sales & HR); attrition decreases steadily with seniority.
- Other Key Drivers: Overtime and lack of promotions strongly linked with resignations.
- EDA insights on attrition by department, salary, promotions, overtime, demographics.
- Classification models (Logistic Regression & Decision Tree) with comparison.
- Feature importance analysis for HR policy recommendations.
- Power BI dashboard with attrition insights.
- Exported datasets for ML and visualization.