Skip to content

ghosteater1311/Classical-Feature_Image-Classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Classical Feature-Based Image Classification Pipeline

A classical (non-deep-learning) machine learning pipeline for image classification built on the Intel Image Classification dataset. The pipeline covers the full lifecycle — raw data → preprocessing → HOG feature extraction → classifier training → evaluation → deployment & simulation.


📋 Table of Contents


Overview

This project explores how far classical machine learning can go on a real-world image classification task without any deep learning. It implements three classifiers from scratch or via scikit-learn, all fed with HOG (Histogram of Oriented Gradients) feature descriptors extracted from preprocessed images.


Demo

{1DB26096-37F1-4782-BD45-124E1B995E92}

Dataset

Intel Image Classification — available on Kaggle

Split Folder Images Labels
Train data/raw/seg_train/ ~14,000 ✅ 6 classes
Test data/raw/seg_test/ ~3,000 ✅ 6 classes
Predict data/raw/seg_pred/ ~7,300 ❌ unlabelled

Classes: buildings · forest · glacier · mountain · sea · street


Project Structure

├── config.yaml                  # Central configuration (image size, HOG params, classifiers)
├── requirements.txt
│
├── data/
│   └── raw/
│       ├── seg_train/           # Labelled training images (class subfolders)
│       ├── seg_test/            # Labelled test images (class subfolders)
│       └── seg_pred/            # Unlabelled images for inference
│
├── src/
│   ├── preprocessing/
│   │   ├── image_preprocessor.py   # Resize → grayscale → normalise
│   │   └── data_loader.py          # Scan splits, train/val split
│   ├── features/
│   │   └── hog_extractor.py        # HOG feature extraction (skimage)
│   ├── classifiers/
│   │   ├── svm_classifier.py       # SVM (linear / RBF) via sklearn
│   │   ├── logistic_regression.py  # Logistic Regression via sklearn
│   │   └── knn_classifier.py       # KNN from scratch (euclidean / manhattan)
│   └── evaluation/
│       └── evaluator.py            # Accuracy, classification report, confusion matrix
│
├── model/
│   ├── model.ipynb              # ⭐ Preprocess + train final SVM-RBF → saves svm_rbf.pkl
│   └── svm_rbf.pkl              # ⚠️ gitignored — generate locally by running model.ipynb
│
├── notebooks/
│   └── modelComparing.ipynb    # Compare all classifiers on held-out test set
│
└── experiments/
    ├── simulate.ipynb           # ⭐ Interactive single-image prediction simulator
    ├── predict.py               # Batch inference on all seg_pred/ images → predictions.csv
    └── predictions.csv          # ⚠️ gitignored — generate locally by running predict.py

Pipeline

Raw Images (seg_train + seg_test)
        │
        ▼
ImagePreprocessor
  • Resize to 128×128
  • Convert to grayscale
  • Normalise pixels to [0, 1]
        │
        ▼
HOGExtractor
  • Orientations : 9
  • Pixels/cell  : 8×8
  • Cells/block  : 2×2
  • Output dim   : 3,969 features per image
        │
        ▼
Classifier (SVM RBF — best performer)
        │
        ▼
Predictions / Evaluation

Classifiers

Classifier Variant Accuracy Notes
SVM kernel='rbf', C=1.2 76.23% ⭐ Best overall — selected as final model
SVM kernel='linear', C=1.2 64.83% Good baseline
Logistic Regression C=0.01 70.97% Fast, interpretable
KNN + PCA k=21, euclidean, 22 components 70.13% Dimensionality-reduced variant
KNN k=5, manhattan 49.40% Custom from-scratch implementation

Full comparison is in notebooks/modelComparing.ipynb.


Results

The SVM with RBF kernel (C=1.2) achieved the best performance and was selected as the final deployed model.


Setup

1. Clone the repository

git clone https://github.com/ghosteater1311/Classical-Feature-Based-Image-Classification-Pipeline.git
cd Classical-Feature-Based-Image-Classification-Pipeline

2. Install dependencies

pip install -r requirements.txt

3. Download the dataset

Download from Kaggle and place the folders so the structure matches data/raw/seg_train/, data/raw/seg_test/, data/raw/seg_pred/.


Usage

Step 1 — Train the final model

Open and run all cells in model/model.ipynb.

This will:

  • Scan all images from seg_train/ + seg_test/
  • Preprocess and extract HOG features
  • Train SVM(kernel='rbf', C=1.2) on the full labelled dataset
  • Save the model to model/svm_rbf.pkl

Step 2 — Batch predict on seg_pred/

python experiments/predict.py

Outputs experiments/predictions.csv with columns: filename, predicted_label, predicted_class_id.

Step 3 — Interactive simulation

Open experiments/simulate.ipynb, set IMAGE_NAME to any filename from seg_pred/, and run the cells to see:

  • The original and preprocessed image side by side
  • The predicted class with confidence percentage
  • A probability bar chart across all 6 classes

License

This project is licensed under the Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

You are free to share and adapt this work, even commercially, as long as you give appropriate credit and distribute any derivative works under the same license.

About

A classical (non-deep-learning) machine learning pipeline for image classification built on the Intel Image Classification dataset.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors