A comprehensive computer vision project implementing multiple approaches for bird species classification, from traditional feature extraction to state-of-the-art deep learning models. Achieves 96.51% test accuracy using transfer learning with ResNet-50.
- Overview
- Dataset
- Project Structure
- Implementation Approaches
- Installation
- Usage
- Results
- Model Comparisons
- Technical Details
- Visualizations
- Contributing
- License
- Contact
- Acknowledgments
This project explores 5 different approaches for bird species classification, progressing from traditional computer vision techniques to modern deep learning architectures:
- Traditional Feature Extraction + ML (HOG, SIFT, Color Histograms + SVM, Random Forest)
- Dimensionality Reduction (PCA + Feature Selection)
- Transfer Learning (Fine-tuned Pretrained CNNs: ResNet, VGG)
- Training from Scratch (Same CNNs with random initialization)
- Custom CNN Architectures (Built from ground up)
- Model: ResNet-50 (Pretrained + Fine-tuned)
- Test Accuracy: 96.51%
- Validation Accuracy: 95.97%
- F1-Score: 96.52%
Source: Kaggle - Indian Birds Species Image Classification
Statistics:
- Total Images: 37,500 high-resolution images
- Bird Species: 25 different Indian bird species
- Original Split: 1,200 train + 300 validation per species
- Project Split: 80-10-10 (train-validation-test)
- Training: 15,000 images
- Validation: 3,749 images
- Test: 3,749 images
- Image Resolution: ~1 MP (approximately 1024x768)
- Format: JPEG
The dataset includes 25 species of Indian birds:
- Asian Green Bee Eater
- Brown Headed Barbet
- Cattle Egret
- Common Kingfisher
- Common Myna
- Common Rosefinch
- Coppersmith Barbet
- Forest Wagtail
- Gray Wagtail
- Hoopoe
- House Crow
- Indian Grey Hornbill
- Indian Peacock
- Indian Pitta
- Indian Roller
- Jungle Babbler
- Northern Lapwing
- Red Wattled Lapwing
- Ruddy Shelduck
- Rufous Treepie
- Sarus Crane
- White Breasted Kingfisher
- White Breasted Waterhen
- White Wagtail
- Yellow Footed Green Pigeon
Due to size constraints (~3-5 GB), the dataset is not included in this repository.
Option 1: Download from Kaggle (Recommended)
# Install Kaggle CLI
pip install kaggle
# Download dataset
kaggle datasets download -d ichhadhari/indian-birds
# Unzip
unzip indian-birds.zip -d Birds_25/Option 2: Manual Download
- Visit: https://www.kaggle.com/datasets/ichhadhari/indian-birds/data
- Click "Download" button
- Extract to project directory as
Birds_25/
Option 3: Academic/Research Use Contact: canmehmetoguz@gmail.com
Place the dataset in the following structure:
project_root/
βββ Birds_25/
β βββ train/
β β βββ ASIAN GREEN BEE EATER/
β β βββ BROWN HEADED BARBET/
β β βββ ... (23 more species)
β βββ valid/
β βββ ASIAN GREEN BEE EATER/
β βββ BROWN HEADED BARBET/
β βββ ... (23 more species)
βββ assignment4.ipynb
βββ README.md
Note: The notebook automatically splits validation set into validation+test (10%+10%).
bird-species-classifier-cnn/
β
βββ bird_species_classifier_cnn.ipynb # Main Jupyter notebook (80 cells)
βββ README.md # This file
βββ requirements.txt # Python dependencies
βββ LICENSE # MIT License
βββ .gitignore # Git ignore file
β
βββ Birds_25/ # Dataset (not included - download separately)
βββ train/ # Training images
βββ valid/ # Validation images
Feature Extraction Methods:
- Color Features:
- Color Histograms (RGB, HSV)
- Color Moments (mean, std, skewness)
- Texture Features:
- HOG (Histogram of Oriented Gradients)
- GLCM (Gray-Level Co-occurrence Matrix)
- LBP (Local Binary Patterns)
- Shape Features:
- Geometric features
- Edge-based features
- Keypoint Features:
- SIFT (Scale-Invariant Feature Transform)
- Gabor Filters
ML Algorithms:
- Support Vector Machines (SVM)
- Random Forest
- Naive Bayes
- Multilayer Perceptron (MLP)
- Logistic Regression
- K-Nearest Neighbors (KNN)
Results: Best accuracy ~57.62% (combined features + Random Forest)
Techniques:
- PCA (Principal Component Analysis): Feature transformation
- Feature Selection: Feature elimination methods
Results: Improved computational efficiency, accuracy ~54.15%
Models Fine-tuned:
-
ResNet-18 (18 layers)
- Trainable parameters: 11,189,337
- Test accuracy: 96.05%
- Training time: ~15 epochs
-
ResNet-50 (50 layers) π
- Trainable parameters: 23,559,257
- Test accuracy: 96.51% β
- Training time: ~15 epochs
-
VGG-16 (16 layers)
- Trainable parameters: 134,362,969
- Test accuracy: 88.96%
- Training time: ~12 epochs
Approach:
- Pretrained weights from ImageNet
- Modified final layer for 25 classes
- Fine-tuned entire network
- Learning rate: 0.001
- Optimizer: Adam
- Loss: CrossEntropyLoss
Same architectures as Part 3, but with random weight initialization:
-
ResNet-50 (Random):
- Test accuracy: 94.43%
- Longer training time required (~20 epochs)
-
ResNet-18 (Random):
- Test accuracy: 93.09%
-
VGG-16 (Random):
- Test accuracy: 4.00% (failed to converge)
- Too deep for training from scratch on small dataset
Key Insight: Transfer learning provides significant advantages!
Built 3 custom architectures:
-
SimpleCNN_v1:
- 3 conv blocks + 2 FC layers
- Parameters: 25,790,041
- Test accuracy: 69.97%
-
SimpleCNN_v2:
- 4 conv blocks + deeper architecture
- Parameters: 26,217,753
- Test accuracy: 70.64%
-
SimpleCNN_v3 (Best Custom):
- Lightweight architecture
- Parameters: 701,017 (35x fewer!)
- Test accuracy: 85.28%
- Most efficient custom model
- Python 3.9 or higher
- CUDA-capable GPU (recommended)
- 8+ GB RAM
- ~5 GB disk space (for dataset)
# Clone the repository
git clone https://github.com/memo-13-byte/bird-species-classifier-cnn.git
cd bird-species-classifier-cnn
# Create virtual environment (optional but recommended)
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Download dataset (see Dataset section)
# Option 1: Kaggle
kaggle datasets download -d ichhadhari/indian-birds
unzip indian-birds.zip -d Birds_25/
# Option 2: Manual download from Kaggle websitetorch>=2.0.0
torchvision>=0.15.0
numpy>=1.24.0
pandas>=2.0.0
matplotlib>=3.7.0
seaborn>=0.12.0
opencv-python>=4.8.0
scikit-learn>=1.3.0
scikit-image>=0.21.0
Pillow>=10.0.0
tqdm>=4.65.0
jupyter>=1.0.0For CUDA support:
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118# Launch Jupyter Notebook
jupyter notebook assignment4.ipynbimport torch
from torchvision import models, transforms
from PIL import Image
# Load pretrained model
model = models.resnet50(pretrained=False)
model.fc = torch.nn.Linear(model.fc.in_features, 25)
model.load_state_dict(torch.load('best_resnet50_model.pth'))
model.eval()
# Preprocessing
transform = transforms.Compose([
transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])
# Load and predict
image = Image.open('bird_image.jpg')
image_tensor = transform(image).unsqueeze(0)
with torch.no_grad():
output = model(image_tensor)
_, predicted = torch.max(output, 1)
print(f"Predicted species: {bird_species[predicted.item()]}")| Approach | Best Model | Test Accuracy | F1-Score | Parameters |
|---|---|---|---|---|
| Part 1: Traditional ML | Random Forest | 57.62% | 57.63% | N/A |
| Part 2: PCA + Selection | Random Forest | 54.15% | 54.08% | N/A |
| Part 3: Pretrained CNNs | ResNet-50 | 96.51% β | 96.52% | 23.5M |
| Part 3: Pretrained CNNs | ResNet-18 | 96.05% | 96.06% | 11.2M |
| Part 3: Pretrained CNNs | VGG-16 | 88.96% | 89.18% | 134.4M |
| Part 4: From Scratch | ResNet-50 | 94.43% | 94.47% | 23.5M |
| Part 4: From Scratch | ResNet-18 | 93.09% | 93.16% | 11.2M |
| Part 5: Custom CNN | SimpleCNN_v3 | 85.28% | 85.44% | 0.7M |
Overall Metrics:
- Test Accuracy: 96.51%
- Validation Accuracy: 95.97%
- Precision: 96.52%
- Recall: 96.51%
- F1-Score: 96.52%
Training Details:
- Epochs: 15
- Best epoch: 13
- Final train loss: 0.0726
- Final valid loss: 0.1648
- Training time: ~2.4 minutes (on GPU)
Per-Class Performance (Sample):
Top Performers:
- Indian Peacock: 99.8% F1
- Common Kingfisher: 99.2% F1
- Hoopoe: 98.7% F1
Challenging Classes:
- Forest Wagtail: 89.3% F1
- Gray Wagtail: 91.2% F1
| Model | Pretrained (Part 3) | From Scratch (Part 4) | Difference |
|---|---|---|---|
| ResNet-50 | 96.51% | 94.43% | -2.08% |
| ResNet-18 | 96.05% | 93.09% | -2.96% |
| VGG-16 | 88.96% | 4.00% | -84.96% |
Key Insights:
- Transfer learning provides 2-3% improvement for ResNet models
- VGG-16 fails to converge when trained from scratch (too deep)
- Pretrained models converge faster (fewer epochs)
SimpleCNN_v3: 701K params β 85.28% accuracy (Best efficiency!)
ResNet-18: 11.2M params β 96.05% accuracy
ResNet-50: 23.5M params β 96.51% accuracy (Best performance!)
VGG-16: 134.4M params β 88.96% accuracy (Overfitting)
Conclusion: ResNet-50 offers best accuracy-complexity tradeoff.
train_transform = transforms.Compose([
transforms.Resize((224, 224)),
transforms.RandomHorizontalFlip(),
transforms.RandomRotation(15),
transforms.ColorJitter(brightness=0.2, contrast=0.2),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])Hyperparameters:
- Optimizer: Adam
- Learning rate: 0.001
- Batch size: 64
- Epochs: 15-20
- Loss function: CrossEntropyLoss
- Device: CUDA (GPU)
Hardware Used:
- GPU: NVIDIA GPU with CUDA support
- RAM: 16 GB
- Storage: ~5 GB for dataset
HOG Parameters:
- Orientations: 9
- Pixels per cell: (8, 8)
- Cells per block: (2, 2)
SIFT Parameters:
- Max keypoints: 100
- Feature vector size: 128
Color Histogram:
- Bins: 32 per channel
- Channels: RGB + HSV
# Stop training if validation loss doesn't improve for 3 epochs
patience = 3
best_val_loss = float('inf')
epochs_no_improve = 0
for epoch in range(num_epochs):
# Training...
if val_loss < best_val_loss:
best_val_loss = val_loss
epochs_no_improve = 0
else:
epochs_no_improve += 1
if epochs_no_improve == patience:
print('Early stopping!')
breakThe notebook includes comprehensive visualizations:
-
Training/Validation Loss Curves
- Epoch-wise loss tracking
- Convergence analysis
- Overfitting detection
-
Accuracy Evolution
- Train vs validation accuracy
- Learning progression
- Performance plateaus
-
Confusion Matrix
- Per-class predictions
- Misclassification patterns
- Species confusion analysis
-
Sample Predictions
- Correctly classified examples
- Misclassified examples with analysis
- Attention maps (Grad-CAM)
-
Dataset Distribution
- Class balance visualization
- Train/val/test split
- Sample images per species
Common Misclassifications:
- Wagtail species confused with each other (similar appearance)
- Different kingfisher species sometimes mixed
- Juveniles vs adults of same species
Reasons for Errors:
- Similar plumage colors
- Similar body shapes
- Occlusion in images
- Different poses/angles
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
- Maintain code quality and comments
- Add tests for new features
- Update documentation
- Follow PEP 8 style guide
This project is licensed under the MIT License - see the LICENSE file for details.
Mehmet OΔuz Kocadere
- π§ Email: canmehmetoguz@gmail.com
- πΌ LinkedIn: mehmet-oguz-kocadere
- π GitHub: @memo-13-byte
- Institution: Hacettepe University - Computer Engineering Department
- Course: BBM 409: Machine Learning Laboratory (Spring 2025)
- Instructor: Prof. Dr. Ahmet Burak Can
- Teaching Assistant: R.A. GΓΆrkem AkyΔ±ldΔ±z
- Project: Assignment 4 - Bird Species Classification
- Source: Kaggle - Indian Birds Species Image Classification
- Creator: Ichhadhari (Kaggle)
- License: Dataset license as specified on Kaggle
- PyTorch: Deep learning framework
- torchvision: Pretrained models and transforms
- OpenCV: Image processing
- scikit-learn: Traditional ML algorithms
- scikit-image: Feature extraction
- He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. CVPR.
- Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. ICLR.
- Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. CVPR.
- Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. IJCV.
- Kaggle Dataset: https://www.kaggle.com/datasets/ichhadhari/indian-birds/data
- Implement more recent architectures (EfficientNet, Vision Transformers)
- Add ensemble methods combining multiple models
- Deploy as web application with Flask/Streamlit
- Implement real-time bird detection with object detection models
- Expand to more bird species
- Add mobile deployment (TensorFlow Lite, ONNX)
β If you found this project helpful, please give it a star!
π Related Projects
- Decision Tree from Scratch - Financial Risk Assessment
- Naive Bayes Sentiment Analysis - Amazon Reviews Analysis
- RepoWise - A RAG based Repository Chat Bot
Made with β€οΈ and π¦ by Mehmet OΔuz Kocadere