🦅 Bird Species Classifier - Deep Learning & Computer Vision

A comprehensive computer vision project implementing multiple approaches for bird species classification, from traditional feature extraction to state-of-the-art deep learning models. Achieves 96.51% test accuracy using transfer learning with ResNet-50.

🎯 Overview

This project explores 5 different approaches for bird species classification, progressing from traditional computer vision techniques to modern deep learning architectures:

Traditional Feature Extraction + ML (HOG, SIFT, Color Histograms + SVM, Random Forest)
Dimensionality Reduction (PCA + Feature Selection)
Transfer Learning (Fine-tuned Pretrained CNNs: ResNet, VGG)
Training from Scratch (Same CNNs with random initialization)
Custom CNN Architectures (Built from ground up)

🏆 Best Performance

Model: ResNet-50 (Pretrained + Fine-tuned)
Test Accuracy: 96.51%
Validation Accuracy: 95.97%
F1-Score: 96.52%

📊 Dataset

Indian Birds Species Classification Dataset

Source: Kaggle - Indian Birds Species Image Classification

Statistics:

Total Images: 37,500 high-resolution images
Bird Species: 25 different Indian bird species
Original Split: 1,200 train + 300 validation per species
Project Split: 80-10-10 (train-validation-test)
- Training: 15,000 images
- Validation: 3,749 images
- Test: 3,749 images
Image Resolution: ~1 MP (approximately 1024x768)
Format: JPEG

🐦 Bird Species List

The dataset includes 25 species of Indian birds:

Asian Green Bee Eater
Brown Headed Barbet
Cattle Egret
Common Kingfisher
Common Myna
Common Rosefinch
Coppersmith Barbet
Forest Wagtail
Gray Wagtail
Hoopoe
House Crow
Indian Grey Hornbill
Indian Peacock
Indian Pitta
Indian Roller
Jungle Babbler
Northern Lapwing
Red Wattled Lapwing
Ruddy Shelduck
Rufous Treepie
Sarus Crane
White Breasted Kingfisher
White Breasted Waterhen
White Wagtail
Yellow Footed Green Pigeon

🚫 Dataset Not Included

Due to size constraints (~3-5 GB), the dataset is not included in this repository.

📥 How to Obtain the Dataset

Option 1: Download from Kaggle (Recommended)

# Install Kaggle CLI
pip install kaggle

# Download dataset
kaggle datasets download -d ichhadhari/indian-birds

# Unzip
unzip indian-birds.zip -d Birds_25/

Option 2: Manual Download

Visit: https://www.kaggle.com/datasets/ichhadhari/indian-birds/data
Click "Download" button
Extract to project directory as Birds_25/

Option 3: Academic/Research Use Contact: canmehmetoguz@gmail.com

📁 Expected Directory Structure

Place the dataset in the following structure:

project_root/
├── Birds_25/
│   ├── train/
│   │   ├── ASIAN GREEN BEE EATER/
│   │   ├── BROWN HEADED BARBET/
│   │   └── ... (23 more species)
│   └── valid/
│       ├── ASIAN GREEN BEE EATER/
│       ├── BROWN HEADED BARBET/
│       └── ... (23 more species)
├── assignment4.ipynb
└── README.md

Note: The notebook automatically splits validation set into validation+test (10%+10%).

🏗️ Project Structure

bird-species-classifier-cnn/
│
├── bird_species_classifier_cnn.ipynb           # Main Jupyter notebook (80 cells)
├── README.md                   # This file
├── requirements.txt            # Python dependencies
├── LICENSE                     # MIT License
├── .gitignore                  # Git ignore file
│
└── Birds_25/                   # Dataset (not included - download separately)
    ├── train/                  # Training images
    └── valid/                  # Validation images

🧠 Implementation Approaches

Part 1: Traditional Feature Extraction + ML Classifiers

Feature Extraction Methods:

Color Features:
- Color Histograms (RGB, HSV)
- Color Moments (mean, std, skewness)
Texture Features:
- HOG (Histogram of Oriented Gradients)
- GLCM (Gray-Level Co-occurrence Matrix)
- LBP (Local Binary Patterns)
Shape Features:
- Geometric features
- Edge-based features
Keypoint Features:
- SIFT (Scale-Invariant Feature Transform)
- Gabor Filters

ML Algorithms:

Support Vector Machines (SVM)
Random Forest
Naive Bayes
Multilayer Perceptron (MLP)
Logistic Regression
K-Nearest Neighbors (KNN)

Results: Best accuracy ~57.62% (combined features + Random Forest)

Part 2: Dimensionality Reduction

Techniques:

PCA (Principal Component Analysis): Feature transformation
Feature Selection: Feature elimination methods

Results: Improved computational efficiency, accuracy ~54.15%

Part 3: Pretrained CNNs (Transfer Learning)

Models Fine-tuned:

ResNet-18 (18 layers)
- Trainable parameters: 11,189,337
- Test accuracy: 96.05%
- Training time: ~15 epochs
ResNet-50 (50 layers) 🏆
- Trainable parameters: 23,559,257
- Test accuracy: 96.51% ⭐
- Training time: ~15 epochs
VGG-16 (16 layers)
- Trainable parameters: 134,362,969
- Test accuracy: 88.96%
- Training time: ~12 epochs

Approach:

Pretrained weights from ImageNet
Modified final layer for 25 classes
Fine-tuned entire network
Learning rate: 0.001
Optimizer: Adam
Loss: CrossEntropyLoss

Part 4: CNNs Trained from Scratch

Same architectures as Part 3, but with random weight initialization:

ResNet-50 (Random):
- Test accuracy: 94.43%
- Longer training time required (~20 epochs)
ResNet-18 (Random):
- Test accuracy: 93.09%
VGG-16 (Random):
- Test accuracy: 4.00% (failed to converge)
- Too deep for training from scratch on small dataset

Key Insight: Transfer learning provides significant advantages!

Part 5: Custom CNN Architectures

Built 3 custom architectures:

SimpleCNN_v1:
- 3 conv blocks + 2 FC layers
- Parameters: 25,790,041
- Test accuracy: 69.97%
SimpleCNN_v2:
- 4 conv blocks + deeper architecture
- Parameters: 26,217,753
- Test accuracy: 70.64%
SimpleCNN_v3 (Best Custom):
- Lightweight architecture
- Parameters: 701,017 (35x fewer!)
- Test accuracy: 85.28%
- Most efficient custom model

🚀 Installation

Prerequisites

Python 3.9 or higher
CUDA-capable GPU (recommended)
8+ GB RAM
~5 GB disk space (for dataset)

Setup

# Clone the repository
git clone https://github.com/memo-13-byte/bird-species-classifier-cnn.git
cd bird-species-classifier-cnn

# Create virtual environment (optional but recommended)
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Download dataset (see Dataset section)
# Option 1: Kaggle
kaggle datasets download -d ichhadhari/indian-birds
unzip indian-birds.zip -d Birds_25/

# Option 2: Manual download from Kaggle website

Requirements

torch>=2.0.0
torchvision>=0.15.0
numpy>=1.24.0
pandas>=2.0.0
matplotlib>=3.7.0
seaborn>=0.12.0
opencv-python>=4.8.0
scikit-learn>=1.3.0
scikit-image>=0.21.0
Pillow>=10.0.0
tqdm>=4.65.0
jupyter>=1.0.0

For CUDA support:

pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118

💻 Usage

Running the Complete Pipeline

# Launch Jupyter Notebook
jupyter notebook assignment4.ipynb

Quick Start - Inference Example

import torch
from torchvision import models, transforms
from PIL import Image

# Load pretrained model
model = models.resnet50(pretrained=False)
model.fc = torch.nn.Linear(model.fc.in_features, 25)
model.load_state_dict(torch.load('best_resnet50_model.pth'))
model.eval()

# Preprocessing
transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

# Load and predict
image = Image.open('bird_image.jpg')
image_tensor = transform(image).unsqueeze(0)

with torch.no_grad():
    output = model(image_tensor)
    _, predicted = torch.max(output, 1)
    
print(f"Predicted species: {bird_species[predicted.item()]}")

📈 Results

Performance Comparison

Approach	Best Model	Test Accuracy	F1-Score	Parameters
Part 1: Traditional ML	Random Forest	57.62%	57.63%	N/A
Part 2: PCA + Selection	Random Forest	54.15%	54.08%	N/A
Part 3: Pretrained CNNs	ResNet-50	96.51% ⭐	96.52%	23.5M
Part 3: Pretrained CNNs	ResNet-18	96.05%	96.06%	11.2M
Part 3: Pretrained CNNs	VGG-16	88.96%	89.18%	134.4M
Part 4: From Scratch	ResNet-50	94.43%	94.47%	23.5M
Part 4: From Scratch	ResNet-18	93.09%	93.16%	11.2M
Part 5: Custom CNN	SimpleCNN_v3	85.28%	85.44%	0.7M

Detailed Results - Best Model (ResNet-50 Pretrained)

Overall Metrics:

Test Accuracy: 96.51%
Validation Accuracy: 95.97%
Precision: 96.52%
Recall: 96.51%
F1-Score: 96.52%

Training Details:

Epochs: 15
Best epoch: 13
Final train loss: 0.0726
Final valid loss: 0.1648
Training time: ~2.4 minutes (on GPU)

Per-Class Performance (Sample):

Top Performers:
- Indian Peacock: 99.8% F1
- Common Kingfisher: 99.2% F1
- Hoopoe: 98.7% F1

Challenging Classes:
- Forest Wagtail: 89.3% F1
- Gray Wagtail: 91.2% F1

🔍 Model Comparisons

Transfer Learning vs Training from Scratch

Model	Pretrained (Part 3)	From Scratch (Part 4)	Difference
ResNet-50	96.51%	94.43%	-2.08%
ResNet-18	96.05%	93.09%	-2.96%
VGG-16	88.96%	4.00%	-84.96% ⚠️

Key Insights:

Transfer learning provides 2-3% improvement for ResNet models
VGG-16 fails to converge when trained from scratch (too deep)
Pretrained models converge faster (fewer epochs)

Model Complexity vs Performance

SimpleCNN_v3:    701K params → 85.28% accuracy (Best efficiency!)
ResNet-18:      11.2M params → 96.05% accuracy
ResNet-50:      23.5M params → 96.51% accuracy (Best performance!)
VGG-16:        134.4M params → 88.96% accuracy (Overfitting)

Conclusion: ResNet-50 offers best accuracy-complexity tradeoff.

🎓 Technical Details

Data Augmentation

train_transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.RandomHorizontalFlip(),
    transforms.RandomRotation(15),
    transforms.ColorJitter(brightness=0.2, contrast=0.2),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

Training Configuration

Hyperparameters:

Optimizer: Adam
Learning rate: 0.001
Batch size: 64
Epochs: 15-20
Loss function: CrossEntropyLoss
Device: CUDA (GPU)

Hardware Used:

GPU: NVIDIA GPU with CUDA support
RAM: 16 GB
Storage: ~5 GB for dataset

Feature Extraction Details (Part 1)

HOG Parameters:

Orientations: 9
Pixels per cell: (8, 8)
Cells per block: (2, 2)

SIFT Parameters:

Max keypoints: 100
Feature vector size: 128

Color Histogram:

Bins: 32 per channel
Channels: RGB + HSV

Early Stopping Strategy

# Stop training if validation loss doesn't improve for 3 epochs
patience = 3
best_val_loss = float('inf')
epochs_no_improve = 0

for epoch in range(num_epochs):
    # Training...
    if val_loss < best_val_loss:
        best_val_loss = val_loss
        epochs_no_improve = 0
    else:
        epochs_no_improve += 1
    
    if epochs_no_improve == patience:
        print('Early stopping!')
        break

📊 Visualizations

Training Progress

The notebook includes comprehensive visualizations:

Training/Validation Loss Curves
- Epoch-wise loss tracking
- Convergence analysis
- Overfitting detection
Accuracy Evolution
- Train vs validation accuracy
- Learning progression
- Performance plateaus
Confusion Matrix
- Per-class predictions
- Misclassification patterns
- Species confusion analysis
Sample Predictions
- Correctly classified examples
- Misclassified examples with analysis
- Attention maps (Grad-CAM)
Dataset Distribution
- Class balance visualization
- Train/val/test split
- Sample images per species

Error Analysis

Common Misclassifications:

Wagtail species confused with each other (similar appearance)
Different kingfisher species sometimes mixed
Juveniles vs adults of same species

Reasons for Errors:

Similar plumage colors
Similar body shapes
Occlusion in images
Different poses/angles

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

How to Contribute

Fork the repository
Create your feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

Guidelines

Maintain code quality and comments
Add tests for new features
Update documentation
Follow PEP 8 style guide

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

👨‍💻 Contact

Mehmet Oğuz Kocadere

📧 Email: canmehmetoguz@gmail.com
💼 LinkedIn: mehmet-oguz-kocadere
🐙 GitHub: @memo-13-byte

🙏 Acknowledgments

Academic Context

Institution: Hacettepe University - Computer Engineering Department
Course: BBM 409: Machine Learning Laboratory (Spring 2025)
Instructor: Prof. Dr. Ahmet Burak Can
Teaching Assistant: R.A. Görkem Akyıldız
Project: Assignment 4 - Bird Species Classification

Dataset

Source: Kaggle - Indian Birds Species Image Classification
Creator: Ichhadhari (Kaggle)
License: Dataset license as specified on Kaggle

Frameworks & Libraries

PyTorch: Deep learning framework
torchvision: Pretrained models and transforms
OpenCV: Image processing
scikit-learn: Traditional ML algorithms
scikit-image: Feature extraction

📚 References

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. CVPR.
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. ICLR.
Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. CVPR.
Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. IJCV.
Kaggle Dataset: https://www.kaggle.com/datasets/ichhadhari/indian-birds/data

🎯 Future Work

Implement more recent architectures (EfficientNet, Vision Transformers)
Add ensemble methods combining multiple models
Deploy as web application with Flask/Streamlit
Implement real-time bird detection with object detection models
Expand to more bird species
Add mobile deployment (TensorFlow Lite, ONNX)

⭐ If you found this project helpful, please give it a star!

🔗 Related Projects

Made with ❤️ and 🐦 by Mehmet Oğuz Kocadere

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
bird_species_classifier_cnn.ipynb		bird_species_classifier_cnn.ipynb
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

🦅 Bird Species Classifier - Deep Learning & Computer Vision

📋 Table of Contents

🎯 Overview

🏆 Best Performance

📊 Dataset

Indian Birds Species Classification Dataset

🐦 Bird Species List

🚫 Dataset Not Included

📥 How to Obtain the Dataset

📁 Expected Directory Structure

🏗️ Project Structure

🧠 Implementation Approaches

Part 1: Traditional Feature Extraction + ML Classifiers

Part 2: Dimensionality Reduction

Part 3: Pretrained CNNs (Transfer Learning)

Part 4: CNNs Trained from Scratch

Part 5: Custom CNN Architectures

🚀 Installation

Prerequisites

Setup

Requirements

💻 Usage

Running the Complete Pipeline

Quick Start - Inference Example

📈 Results

Performance Comparison

Detailed Results - Best Model (ResNet-50 Pretrained)

🔍 Model Comparisons

Transfer Learning vs Training from Scratch

Model Complexity vs Performance

🎓 Technical Details

Data Augmentation

Training Configuration

Feature Extraction Details (Part 1)

Early Stopping Strategy

📊 Visualizations

Training Progress

Error Analysis

🤝 Contributing

How to Contribute

Guidelines

📄 License

👨‍💻 Contact

🙏 Acknowledgments

Academic Context

Dataset

Frameworks & Libraries

📚 References

🎯 Future Work

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages