This project uses the Kaggle competition dataset from the "ML with Python Course Project" to predict campus recruitment outcomes. It includes preprocessing, EDA, feature engineering, and model training using classification algorithms such as Logistic Regression, Decision Tree, and K-Nearest Neighbors.
- Source: Kaggle – Campus Recruitment Dataset by Rashid Minhas (2020)
- Size: 215 records
- Target Variable:
status
(Placed / Not Placed)
Key features include:
- Academic scores:
ssc_p
,hsc_p
,degree_p
,mba_p
- Categorical attributes:
gender
,ssc_b
,hsc_b
,hsc_s
,degree_t
,workex
,specialisation
- Test scores:
etest_p
- Outcome:
status
, with optionalsalary
for placed students
-
Exploratory Data Analysis (EDA):
- Checked feature distributions, missing values, and data types
- Analyzed target imbalance and correlation with predictors
-
Preprocessing:
- Encoded categorical variables
- Split into train/test sets
- Scaled numeric features post-split to avoid leakage
-
Model Training: Six classifiers were trained with balanced class weights where needed:
- Logistic Regression
- Decision Tree Classifier
- Random Forest Classifier
- Support Vector Machine (SVM)
- K-Nearest Neighbors (KNN)
- XGBoost Classifier
-
Model Evaluation:
- Evaluated with Accuracy, Precision, Recall, and F1-Score
- Cross-validation was applied
- Confusion matrices and classification reports used for comparison
- All models performed decently, with ensemble models (Random Forest, XGBoost) outperforming linear ones.
- XGBoost and SVM achieved strong balance between precision and recall.
- Feature importance analysis showed
mba_p
,etest_p
, anddegree_p
as influential predictors.
-
Clone the repository:
git clone https://github.com/chawthinn/campus-placement-prediction.git cd campus-placement-prediction
-
Open the notebook in Jupyter or run on Kaggle.
-
Dataset is already embedded in the Kaggle competition environment:
/kaggle/input/ml-with-python-course-project/
- Python 3.10+
- Pandas
- Scikit-learn
- XGBoost
- Seaborn
- Matplotlib
- Jupyter Notebook