Skip to content

A Machine Learning project for Forest Cover Type Classification using Decision Tree, Random Forest, and XGBoost with hyperparameter tuning to achieve high predictive accuracy.

Notifications You must be signed in to change notification settings

Adeeba-Shahzadi/ForestCoverTypeClassification-MultiClassificationModel

Repository files navigation

# 🌲 Forest Cover Type Classification

📌 Project Overview

This project focuses on predicting the forest cover type from cartographic and environmental variables using machine learning.

It trained and compared three classification models — Decision Tree, Random Forest, and XGBoost — and further improved performance with hyperparameter tuning.

The project demonstrates the full machine learning pipeline: data cleaning, preprocessing, training, evaluation, visualization, and tuning.


📂 Dataset

  • Source: UCI Machine Learning Repository — Covertype Dataset
  • Format: .data (converted to .csv for processing)
  • Number of Instances: 581,012
  • Number of Features: 54 cartographic variables (e.g., elevation, slope, soil type, distance to hydrology, etc.)
  • Target Variable: Cover_Type (multi-class, 7 forest cover categories)

🎯 Classes

The target variable Cover_Type has 7 categories representing different types of forest cover:

  1. Spruce/Fir
  2. Lodgepole Pine
  3. Ponderosa Pine
  4. Cottonwood/Willow
  5. Aspen
  6. Douglas-fir
  7. Krummholz

🧠 Models Used

1. Decision Tree

  • A tree-like model where decisions are made by splitting features into branches.
  • Easy to interpret, but can overfit if the tree is too deep.
  • Good baseline model for classification.

2. Random Forest

  • An ensemble of multiple decision trees (a "forest").
  • Each tree is trained on a random subset of data and features.
  • More accurate and robust than a single Decision Tree because it reduces overfitting.

3. XGBoost (Extreme Gradient Boosting)

  • A boosting algorithm that builds trees sequentially.
  • Each new tree corrects the errors of the previous ones.
  • Very powerful for structured/tabular data and often achieves state-of-the-art results.
  • Requires careful hyperparameter tuning for best performance.

🛠️ Steps Followed

  1. Data Cleaning & Preprocessing

    • Added column names from the dataset description.
    • Checked for missing values (none were found).
    • Converted categorical features into usable formats.
    • (Optional) Outlier detection using Z-score.
  2. Train-Test Split

    • 80% training, 20% testing.
  3. Model Training

    • Trained Decision Tree, Random Forest, and XGBoost classifiers.
  4. Model Evaluation

    • Accuracy Score
    • Precision, Recall, F1-Score
    • Confusion Matrix (heatmap visualization)
  5. Feature Importance

    • Visualized the most important features for tree-based models.
  6. Hyperparameter Tuning

    • Used GridSearchCV and RandomizedSearchCV to optimize hyperparameters.
    • Reduced overfitting and improved accuracy.

📊 Results (Accuracy Scores)

Model Accuracy Score
Decision Tree (Default) 0.9059
Random Forest (Default) 0.9308
XGBoost (Default) 0.8711
Decision Tree (Tuned) 0.9124
Random Forest (Tuned) 0.9524
XGBoost (Tuned) 0.9581

Best Model: XGBoost (Tuned) with 95.8% accuracy.


📈 Visualizations

  • Confusion Matrix Heatmap → to check per-class predictions.
  • Feature Importance Bar Chart → to identify top predictive features (e.g., Elevation, Horizontal Distance to Roadways).

🚀 How to Run

  • Install dependencies:
pip install -r requirements.txt
  • clone this repository:
git clone https://github.com/Adeeba-Shahzadi/ForestCoverClassification-MultiClassificationModel.git
  • Navigate to the project folder:
cd ForestCoverClassification-MultiClassificationModel
  • Run the notebook or script:
jupyter ForestCoverTypeClassification.ipynb

OR

python forestcovertypeclassification.py

About

A Machine Learning project for Forest Cover Type Classification using Decision Tree, Random Forest, and XGBoost with hyperparameter tuning to achieve high predictive accuracy.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published