A classical (non-deep-learning) machine learning pipeline for image classification built on the Intel Image Classification dataset. The pipeline covers the full lifecycle — raw data → preprocessing → HOG feature extraction → classifier training → evaluation → deployment & simulation.
This project explores how far classical machine learning can go on a real-world image classification task without any deep learning. It implements three classifiers from scratch or via scikit-learn, all fed with HOG (Histogram of Oriented Gradients) feature descriptors extracted from preprocessed images.
Intel Image Classification — available on Kaggle
| Split | Folder | Images | Labels |
|---|---|---|---|
| Train | data/raw/seg_train/ |
~14,000 | ✅ 6 classes |
| Test | data/raw/seg_test/ |
~3,000 | ✅ 6 classes |
| Predict | data/raw/seg_pred/ |
~7,300 | ❌ unlabelled |
Classes: buildings · forest · glacier · mountain · sea · street
├── config.yaml # Central configuration (image size, HOG params, classifiers)
├── requirements.txt
│
├── data/
│ └── raw/
│ ├── seg_train/ # Labelled training images (class subfolders)
│ ├── seg_test/ # Labelled test images (class subfolders)
│ └── seg_pred/ # Unlabelled images for inference
│
├── src/
│ ├── preprocessing/
│ │ ├── image_preprocessor.py # Resize → grayscale → normalise
│ │ └── data_loader.py # Scan splits, train/val split
│ ├── features/
│ │ └── hog_extractor.py # HOG feature extraction (skimage)
│ ├── classifiers/
│ │ ├── svm_classifier.py # SVM (linear / RBF) via sklearn
│ │ ├── logistic_regression.py # Logistic Regression via sklearn
│ │ └── knn_classifier.py # KNN from scratch (euclidean / manhattan)
│ └── evaluation/
│ └── evaluator.py # Accuracy, classification report, confusion matrix
│
├── model/
│ ├── model.ipynb # ⭐ Preprocess + train final SVM-RBF → saves svm_rbf.pkl
│ └── svm_rbf.pkl # ⚠️ gitignored — generate locally by running model.ipynb
│
├── notebooks/
│ └── modelComparing.ipynb # Compare all classifiers on held-out test set
│
└── experiments/
├── simulate.ipynb # ⭐ Interactive single-image prediction simulator
├── predict.py # Batch inference on all seg_pred/ images → predictions.csv
└── predictions.csv # ⚠️ gitignored — generate locally by running predict.py
Raw Images (seg_train + seg_test)
│
▼
ImagePreprocessor
• Resize to 128×128
• Convert to grayscale
• Normalise pixels to [0, 1]
│
▼
HOGExtractor
• Orientations : 9
• Pixels/cell : 8×8
• Cells/block : 2×2
• Output dim : 3,969 features per image
│
▼
Classifier (SVM RBF — best performer)
│
▼
Predictions / Evaluation
| Classifier | Variant | Accuracy | Notes |
|---|---|---|---|
| SVM | kernel='rbf', C=1.2 |
76.23% | ⭐ Best overall — selected as final model |
| SVM | kernel='linear', C=1.2 |
64.83% | Good baseline |
| Logistic Regression | C=0.01 |
70.97% | Fast, interpretable |
| KNN + PCA | k=21, euclidean, 22 components |
70.13% | Dimensionality-reduced variant |
| KNN | k=5, manhattan |
49.40% | Custom from-scratch implementation |
Full comparison is in notebooks/modelComparing.ipynb.
The SVM with RBF kernel (C=1.2) achieved the best performance and was selected as the final deployed model.
1. Clone the repository
git clone https://github.com/ghosteater1311/Classical-Feature-Based-Image-Classification-Pipeline.git
cd Classical-Feature-Based-Image-Classification-Pipeline2. Install dependencies
pip install -r requirements.txt3. Download the dataset
Download from Kaggle and place the folders so the structure matches data/raw/seg_train/, data/raw/seg_test/, data/raw/seg_pred/.
Open and run all cells in model/model.ipynb.
This will:
- Scan all images from
seg_train/+seg_test/ - Preprocess and extract HOG features
- Train
SVM(kernel='rbf', C=1.2)on the full labelled dataset - Save the model to
model/svm_rbf.pkl
python experiments/predict.pyOutputs experiments/predictions.csv with columns: filename, predicted_label, predicted_class_id.
Open experiments/simulate.ipynb, set IMAGE_NAME to any filename from seg_pred/, and run the cells to see:
- The original and preprocessed image side by side
- The predicted class with confidence percentage
- A probability bar chart across all 6 classes
This project is licensed under the Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.
You are free to share and adapt this work, even commercially, as long as you give appropriate credit and distribute any derivative works under the same license.