The goal of this project is to perform classification using the Decision Tree algorithm on a dataset loaded from a CSV file (Social_Network_Ads.csv). The classification task involves predicting whether a user purchases a product based on features such as age and estimated salary.
This project applies a Decision Tree Classifier to predict whether a user will purchase a product based on social network ad data. The dataset contains user attributes such as age and estimated salary, and the binary target variable indicates whether the product was purchased (1) or not (0).
The goal is to:
- Understand how Decision Trees partition feature space.
- Train and evaluate a model on real-world user behavior data.
- Visualize decision boundaries to explain classification.
The notebook follows a well-structured classification pipeline:
-
Data Loading:
- Read from
Social_Network_Ads.csv
- Read from
-
Preprocessing:
- Extract features and labels.
- Apply
StandardScalerfor feature normalization.
-
Train-Test Split:
- 75% training / 25% test split using
train_test_split.
- 75% training / 25% test split using
-
Model Training:
- Use
DecisionTreeClassifierwith'entropy'criterion.
- Use
-
Prediction:
- Test prediction for a sample input (
[30, 87000])
- Test prediction for a sample input (
-
(Optional): Visualizations and interpretability analysis.
- Python 3
pandas,numpy– data manipulationscikit-learn– model training, preprocessing, evaluationmatplotlib,seaborn– plotting and visualization
Installation: Not explicitly provided, but typically requires:
pip install numpy pandas matplotlib scikit-learnExecution:
- Ensure
Social_Network_Ads.csvis present in the working directory. - Run the notebook
Decision Tree (Classification).ipynbsequentially.
- Confusion matrix used to evaluate performance.
- No specific accuracy metrics (e.g., precision, recall, F1) were calculated.
- Visualizations suggest decent separation of classes by the decision tree.
- Successfully trained a Decision Tree model using entropy criterion.
- Sample test prediction performed for
[Age=30, Salary=87000]. - Feature scaling significantly improved decision boundary sharpness.
-
Plots included:
- Decision boundaries for training set
- Decision boundaries for test set
- Learned how decision trees split based on information gain (entropy).
- Emphasized the role of feature scaling in decision-based models.
- Understood basic interpretability of Decision Trees in classification tasks.
-
File:
Social_Network_Ads.csv -
Features:
Age: NumericalEstimatedSalary: Numerical
-
Target:
Purchased: Binary label (0 or 1)
-
Goal: Predict purchase behavior from demographic features.
-
Clone the repository or download the
.ipynbfile. -
Install dependencies:
pip install pandas numpy scikit-learn matplotlib seaborn
-
Make sure
Social_Network_Ads.csvis in the same directory. -
Launch the notebook:
jupyter notebook "Decision Tree (Classification).ipynb"
Email: [email protected] GitHub: https://github.com/imehranasgari
This project is licensed under the Apache 2.0 License – see the LICENSE file for details.
💡 Some interactive outputs (e.g., plots, widgets) may not display correctly on GitHub. If so, please view this notebook via nbviewer.org for full rendering.