Skip to content

sharanyaprasad/SCT_DS_03

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

SCT_DS_03

🏦 Bank Marketing Decision Tree Classifier

Python Scikit-learn Pandas

πŸ“‹ Task Description

Build a decision tree classifier to predict whether a customer will purchase a product or service based on their demographic and behavioral data.

πŸ” How was it done?

The Bank Marketing dataset from the UCI Machine Learning Repository containing 41,188 customer records was used. A decision tree classification model was built following these steps:

  1. Data Loading & Exploration - Loaded the dataset and analyzed basic statistics
  2. Data Preprocessing - Encoded categorical variables (job, marital status, education, etc.) using Label Encoding
  3. Train-Test Split - Split data into 80% training and 20% testing sets
  4. Model Building - Created a Decision Tree Classifier with max_depth=5 to prevent overfitting
  5. Evaluation - Assessed model performance using accuracy, confusion matrix, and classification report
  6. Feature Analysis - Identified the most important features influencing customer decisions

Python libraries including pandas, scikit-learn, matplotlib, and seaborn were used for data processing, modeling, and visualization.

πŸ“Š Dataset Source

UCI Machine Learning Repository - Bank Marketing Dataset

πŸ’‘ Key Findings

  • Test Accuracy: ~90% - Model performs well on unseen data
  • Class Imbalance: Only 11.3% customers subscribed to term deposits
  • Top Important Features:
    • Duration of last contact
    • Economic indicators (euribor3m, emp.var.rate)
    • Number of previous contacts
    • Customer age
  • Decision Tree Depth: 5 levels effectively capture patterns without overfitting
  • Model Generalization: Similar training and testing accuracy indicates good generalization

🎨 Visualizations

Confusion Matrix

Shows the model's prediction accuracy across both classes (subscribed vs not subscribed)

Feature Importance

Identifies which customer attributes most influence subscription decisions

Decision Tree Structure

Visual representation of the decision-making process at each node

πŸ”§ Tech Stack

  • Python 3.8+
  • Pandas - Data manipulation
  • Scikit-learn - Machine learning model
  • Matplotlib - Data visualization
  • Seaborn - Enhanced visualizations
  • NumPy - Numerical operations
  • Jupyter Notebook - Interactive analysis

πŸ“ˆ Results Summary

Metric Value Description
Total Records 41,188 Complete dataset size
Training Set 32,950 (80%) Data used for training
Testing Set 8,238 (20%) Data used for evaluation
Test Accuracy ~90% Model performance
Subscribed (Yes) 4,640 (11.3%) Positive class
Not Subscribed (No) 36,548 (88.7%) Negative class
Tree Depth 5 levels Complexity control
Important Features Duration, Economic indicators Top predictors

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published