Skip to content

memo-13-byte/Bird-Species-Classifier-CNN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ¦… Bird Species Classifier - Deep Learning & Computer Vision

A comprehensive computer vision project implementing multiple approaches for bird species classification, from traditional feature extraction to state-of-the-art deep learning models. Achieves 96.51% test accuracy using transfer learning with ResNet-50.

Python PyTorch OpenCV License

πŸ“‹ Table of Contents

🎯 Overview

This project explores 5 different approaches for bird species classification, progressing from traditional computer vision techniques to modern deep learning architectures:

  1. Traditional Feature Extraction + ML (HOG, SIFT, Color Histograms + SVM, Random Forest)
  2. Dimensionality Reduction (PCA + Feature Selection)
  3. Transfer Learning (Fine-tuned Pretrained CNNs: ResNet, VGG)
  4. Training from Scratch (Same CNNs with random initialization)
  5. Custom CNN Architectures (Built from ground up)

πŸ† Best Performance

  • Model: ResNet-50 (Pretrained + Fine-tuned)
  • Test Accuracy: 96.51%
  • Validation Accuracy: 95.97%
  • F1-Score: 96.52%

πŸ“Š Dataset

Indian Birds Species Classification Dataset

Source: Kaggle - Indian Birds Species Image Classification

Statistics:

  • Total Images: 37,500 high-resolution images
  • Bird Species: 25 different Indian bird species
  • Original Split: 1,200 train + 300 validation per species
  • Project Split: 80-10-10 (train-validation-test)
    • Training: 15,000 images
    • Validation: 3,749 images
    • Test: 3,749 images
  • Image Resolution: ~1 MP (approximately 1024x768)
  • Format: JPEG

🐦 Bird Species List

The dataset includes 25 species of Indian birds:

  • Asian Green Bee Eater
  • Brown Headed Barbet
  • Cattle Egret
  • Common Kingfisher
  • Common Myna
  • Common Rosefinch
  • Coppersmith Barbet
  • Forest Wagtail
  • Gray Wagtail
  • Hoopoe
  • House Crow
  • Indian Grey Hornbill
  • Indian Peacock
  • Indian Pitta
  • Indian Roller
  • Jungle Babbler
  • Northern Lapwing
  • Red Wattled Lapwing
  • Ruddy Shelduck
  • Rufous Treepie
  • Sarus Crane
  • White Breasted Kingfisher
  • White Breasted Waterhen
  • White Wagtail
  • Yellow Footed Green Pigeon

🚫 Dataset Not Included

Due to size constraints (~3-5 GB), the dataset is not included in this repository.

πŸ“₯ How to Obtain the Dataset

Option 1: Download from Kaggle (Recommended)

# Install Kaggle CLI
pip install kaggle

# Download dataset
kaggle datasets download -d ichhadhari/indian-birds

# Unzip
unzip indian-birds.zip -d Birds_25/

Option 2: Manual Download

  1. Visit: https://www.kaggle.com/datasets/ichhadhari/indian-birds/data
  2. Click "Download" button
  3. Extract to project directory as Birds_25/

Option 3: Academic/Research Use Contact: canmehmetoguz@gmail.com

πŸ“ Expected Directory Structure

Place the dataset in the following structure:

project_root/
β”œβ”€β”€ Birds_25/
β”‚   β”œβ”€β”€ train/
β”‚   β”‚   β”œβ”€β”€ ASIAN GREEN BEE EATER/
β”‚   β”‚   β”œβ”€β”€ BROWN HEADED BARBET/
β”‚   β”‚   └── ... (23 more species)
β”‚   └── valid/
β”‚       β”œβ”€β”€ ASIAN GREEN BEE EATER/
β”‚       β”œβ”€β”€ BROWN HEADED BARBET/
β”‚       └── ... (23 more species)
β”œβ”€β”€ assignment4.ipynb
└── README.md

Note: The notebook automatically splits validation set into validation+test (10%+10%).

πŸ—οΈ Project Structure

bird-species-classifier-cnn/
β”‚
β”œβ”€β”€ bird_species_classifier_cnn.ipynb           # Main Jupyter notebook (80 cells)
β”œβ”€β”€ README.md                   # This file
β”œβ”€β”€ requirements.txt            # Python dependencies
β”œβ”€β”€ LICENSE                     # MIT License
β”œβ”€β”€ .gitignore                  # Git ignore file
β”‚
└── Birds_25/                   # Dataset (not included - download separately)
    β”œβ”€β”€ train/                  # Training images
    └── valid/                  # Validation images

🧠 Implementation Approaches

Part 1: Traditional Feature Extraction + ML Classifiers

Feature Extraction Methods:

  • Color Features:
    • Color Histograms (RGB, HSV)
    • Color Moments (mean, std, skewness)
  • Texture Features:
    • HOG (Histogram of Oriented Gradients)
    • GLCM (Gray-Level Co-occurrence Matrix)
    • LBP (Local Binary Patterns)
  • Shape Features:
    • Geometric features
    • Edge-based features
  • Keypoint Features:
    • SIFT (Scale-Invariant Feature Transform)
    • Gabor Filters

ML Algorithms:

  • Support Vector Machines (SVM)
  • Random Forest
  • Naive Bayes
  • Multilayer Perceptron (MLP)
  • Logistic Regression
  • K-Nearest Neighbors (KNN)

Results: Best accuracy ~57.62% (combined features + Random Forest)

Part 2: Dimensionality Reduction

Techniques:

  • PCA (Principal Component Analysis): Feature transformation
  • Feature Selection: Feature elimination methods

Results: Improved computational efficiency, accuracy ~54.15%

Part 3: Pretrained CNNs (Transfer Learning)

Models Fine-tuned:

  1. ResNet-18 (18 layers)

    • Trainable parameters: 11,189,337
    • Test accuracy: 96.05%
    • Training time: ~15 epochs
  2. ResNet-50 (50 layers) πŸ†

    • Trainable parameters: 23,559,257
    • Test accuracy: 96.51% ⭐
    • Training time: ~15 epochs
  3. VGG-16 (16 layers)

    • Trainable parameters: 134,362,969
    • Test accuracy: 88.96%
    • Training time: ~12 epochs

Approach:

  • Pretrained weights from ImageNet
  • Modified final layer for 25 classes
  • Fine-tuned entire network
  • Learning rate: 0.001
  • Optimizer: Adam
  • Loss: CrossEntropyLoss

Part 4: CNNs Trained from Scratch

Same architectures as Part 3, but with random weight initialization:

  1. ResNet-50 (Random):

    • Test accuracy: 94.43%
    • Longer training time required (~20 epochs)
  2. ResNet-18 (Random):

    • Test accuracy: 93.09%
  3. VGG-16 (Random):

    • Test accuracy: 4.00% (failed to converge)
    • Too deep for training from scratch on small dataset

Key Insight: Transfer learning provides significant advantages!

Part 5: Custom CNN Architectures

Built 3 custom architectures:

  1. SimpleCNN_v1:

    • 3 conv blocks + 2 FC layers
    • Parameters: 25,790,041
    • Test accuracy: 69.97%
  2. SimpleCNN_v2:

    • 4 conv blocks + deeper architecture
    • Parameters: 26,217,753
    • Test accuracy: 70.64%
  3. SimpleCNN_v3 (Best Custom):

    • Lightweight architecture
    • Parameters: 701,017 (35x fewer!)
    • Test accuracy: 85.28%
    • Most efficient custom model

πŸš€ Installation

Prerequisites

  • Python 3.9 or higher
  • CUDA-capable GPU (recommended)
  • 8+ GB RAM
  • ~5 GB disk space (for dataset)

Setup

# Clone the repository
git clone https://github.com/memo-13-byte/bird-species-classifier-cnn.git
cd bird-species-classifier-cnn

# Create virtual environment (optional but recommended)
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Download dataset (see Dataset section)
# Option 1: Kaggle
kaggle datasets download -d ichhadhari/indian-birds
unzip indian-birds.zip -d Birds_25/

# Option 2: Manual download from Kaggle website

Requirements

torch>=2.0.0
torchvision>=0.15.0
numpy>=1.24.0
pandas>=2.0.0
matplotlib>=3.7.0
seaborn>=0.12.0
opencv-python>=4.8.0
scikit-learn>=1.3.0
scikit-image>=0.21.0
Pillow>=10.0.0
tqdm>=4.65.0
jupyter>=1.0.0

For CUDA support:

pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118

πŸ’» Usage

Running the Complete Pipeline

# Launch Jupyter Notebook
jupyter notebook assignment4.ipynb

Quick Start - Inference Example

import torch
from torchvision import models, transforms
from PIL import Image

# Load pretrained model
model = models.resnet50(pretrained=False)
model.fc = torch.nn.Linear(model.fc.in_features, 25)
model.load_state_dict(torch.load('best_resnet50_model.pth'))
model.eval()

# Preprocessing
transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

# Load and predict
image = Image.open('bird_image.jpg')
image_tensor = transform(image).unsqueeze(0)

with torch.no_grad():
    output = model(image_tensor)
    _, predicted = torch.max(output, 1)
    
print(f"Predicted species: {bird_species[predicted.item()]}")

πŸ“ˆ Results

Performance Comparison

Approach Best Model Test Accuracy F1-Score Parameters
Part 1: Traditional ML Random Forest 57.62% 57.63% N/A
Part 2: PCA + Selection Random Forest 54.15% 54.08% N/A
Part 3: Pretrained CNNs ResNet-50 96.51% ⭐ 96.52% 23.5M
Part 3: Pretrained CNNs ResNet-18 96.05% 96.06% 11.2M
Part 3: Pretrained CNNs VGG-16 88.96% 89.18% 134.4M
Part 4: From Scratch ResNet-50 94.43% 94.47% 23.5M
Part 4: From Scratch ResNet-18 93.09% 93.16% 11.2M
Part 5: Custom CNN SimpleCNN_v3 85.28% 85.44% 0.7M

Detailed Results - Best Model (ResNet-50 Pretrained)

Overall Metrics:

  • Test Accuracy: 96.51%
  • Validation Accuracy: 95.97%
  • Precision: 96.52%
  • Recall: 96.51%
  • F1-Score: 96.52%

Training Details:

  • Epochs: 15
  • Best epoch: 13
  • Final train loss: 0.0726
  • Final valid loss: 0.1648
  • Training time: ~2.4 minutes (on GPU)

Per-Class Performance (Sample):

Top Performers:
- Indian Peacock: 99.8% F1
- Common Kingfisher: 99.2% F1
- Hoopoe: 98.7% F1

Challenging Classes:
- Forest Wagtail: 89.3% F1
- Gray Wagtail: 91.2% F1

πŸ” Model Comparisons

Transfer Learning vs Training from Scratch

Model Pretrained (Part 3) From Scratch (Part 4) Difference
ResNet-50 96.51% 94.43% -2.08%
ResNet-18 96.05% 93.09% -2.96%
VGG-16 88.96% 4.00% -84.96% ⚠️

Key Insights:

  • Transfer learning provides 2-3% improvement for ResNet models
  • VGG-16 fails to converge when trained from scratch (too deep)
  • Pretrained models converge faster (fewer epochs)

Model Complexity vs Performance

SimpleCNN_v3:    701K params β†’ 85.28% accuracy (Best efficiency!)
ResNet-18:      11.2M params β†’ 96.05% accuracy
ResNet-50:      23.5M params β†’ 96.51% accuracy (Best performance!)
VGG-16:        134.4M params β†’ 88.96% accuracy (Overfitting)

Conclusion: ResNet-50 offers best accuracy-complexity tradeoff.

πŸŽ“ Technical Details

Data Augmentation

train_transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.RandomHorizontalFlip(),
    transforms.RandomRotation(15),
    transforms.ColorJitter(brightness=0.2, contrast=0.2),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

Training Configuration

Hyperparameters:

  • Optimizer: Adam
  • Learning rate: 0.001
  • Batch size: 64
  • Epochs: 15-20
  • Loss function: CrossEntropyLoss
  • Device: CUDA (GPU)

Hardware Used:

  • GPU: NVIDIA GPU with CUDA support
  • RAM: 16 GB
  • Storage: ~5 GB for dataset

Feature Extraction Details (Part 1)

HOG Parameters:

  • Orientations: 9
  • Pixels per cell: (8, 8)
  • Cells per block: (2, 2)

SIFT Parameters:

  • Max keypoints: 100
  • Feature vector size: 128

Color Histogram:

  • Bins: 32 per channel
  • Channels: RGB + HSV

Early Stopping Strategy

# Stop training if validation loss doesn't improve for 3 epochs
patience = 3
best_val_loss = float('inf')
epochs_no_improve = 0

for epoch in range(num_epochs):
    # Training...
    if val_loss < best_val_loss:
        best_val_loss = val_loss
        epochs_no_improve = 0
    else:
        epochs_no_improve += 1
    
    if epochs_no_improve == patience:
        print('Early stopping!')
        break

πŸ“Š Visualizations

Training Progress

The notebook includes comprehensive visualizations:

  1. Training/Validation Loss Curves

    • Epoch-wise loss tracking
    • Convergence analysis
    • Overfitting detection
  2. Accuracy Evolution

    • Train vs validation accuracy
    • Learning progression
    • Performance plateaus
  3. Confusion Matrix

    • Per-class predictions
    • Misclassification patterns
    • Species confusion analysis
  4. Sample Predictions

    • Correctly classified examples
    • Misclassified examples with analysis
    • Attention maps (Grad-CAM)
  5. Dataset Distribution

    • Class balance visualization
    • Train/val/test split
    • Sample images per species

Error Analysis

Common Misclassifications:

  • Wagtail species confused with each other (similar appearance)
  • Different kingfisher species sometimes mixed
  • Juveniles vs adults of same species

Reasons for Errors:

  • Similar plumage colors
  • Similar body shapes
  • Occlusion in images
  • Different poses/angles

🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

How to Contribute

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

Guidelines

  • Maintain code quality and comments
  • Add tests for new features
  • Update documentation
  • Follow PEP 8 style guide

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ‘¨β€πŸ’» Contact

Mehmet Oğuz Kocadere

πŸ™ Acknowledgments

Academic Context

  • Institution: Hacettepe University - Computer Engineering Department
  • Course: BBM 409: Machine Learning Laboratory (Spring 2025)
  • Instructor: Prof. Dr. Ahmet Burak Can
  • Teaching Assistant: R.A. GΓΆrkem AkyΔ±ldΔ±z
  • Project: Assignment 4 - Bird Species Classification

Dataset

Frameworks & Libraries

  • PyTorch: Deep learning framework
  • torchvision: Pretrained models and transforms
  • OpenCV: Image processing
  • scikit-learn: Traditional ML algorithms
  • scikit-image: Feature extraction

πŸ“š References

  1. He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. CVPR.
  2. Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. ICLR.
  3. Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. CVPR.
  4. Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. IJCV.
  5. Kaggle Dataset: https://www.kaggle.com/datasets/ichhadhari/indian-birds/data

🎯 Future Work

  • Implement more recent architectures (EfficientNet, Vision Transformers)
  • Add ensemble methods combining multiple models
  • Deploy as web application with Flask/Streamlit
  • Implement real-time bird detection with object detection models
  • Expand to more bird species
  • Add mobile deployment (TensorFlow Lite, ONNX)

⭐ If you found this project helpful, please give it a star!

πŸ”— Related Projects


Made with ❀️ and 🐦 by Mehmet Oğuz Kocadere

About

Deep learning bird species classifier achieving 96.51% accuracy using PyTorch and transfer learning. Implements 5 approaches from traditional computer vision (HOG, SIFT) to CNNs (ResNet-50, VGG-16). Trained on 25 Indian bird species with 22,498 images. Comprehensive comparison of feature extraction vs deep learning.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors