This project addresses the classic computer vision problem of image classification using the CIFAR-10 dataset. The primary goal was not just to build a functional classifier, but to architect a robust Convolutional Neural Network (CNN) from scratch and systematically enhance its performance, training efficiency, and reliability.
The focus was on implementing and evaluating advanced deep learning techniques, including custom learning rate schedulers, model calibration, and uncertainty estimation, to create a model that is both accurate and trustworthy.
The solution follows a comprehensive pipeline, starting from data preparation and moving through advanced model architecture design, training optimization, and detailed post-hoc analysis.
A custom CNN was designed as a keras.models.Sequential model with a focus on deep architecture and robust regularization to prevent overfitting.
- Convolutional Blocks: The network consists of four sequential convolutional blocks with increasing filter depths (64 → 125 → 256 → 560) to capture features of increasing complexity.
- Regularization: To ensure generalization, the model incorporates multiple regularization techniques:
- L2 Kernel Regularization (
1e-4) - Kernel Constraint (
max_norm) - Dropout layers with varying rates (0.1 to 0.3) after convolutional and dense layers.
- L2 Kernel Regularization (
- Normalization:
BatchNormalizationis used after every convolutional and dense layer to stabilize and accelerate training. - Dense Head: A
GlobalAveragePooling2Dlayer reduces the feature maps before they are passed to two fully connected (Dense) layers, which act as the classifier head. - Initialization:
HeNormalandGlorotUniforminitializers are used for optimal weight initialization.
Several state-of-the-art techniques were employed to improve training dynamics and model performance.
- Mixed Precision Training: Utilized TensorFlow's
mixed_float16policy to leverage the tensor cores on my NVIDIA RTX 3070 Ti, significantly reducing training time and memory consumption without compromising accuracy. - Data Augmentation: An
ImageDataGeneratorwas implemented to augment the training data in real-time. This included random rotations, shifts, and horizontal flips, which helps the model learn invariant features and generalize better. - Advanced Learning Rate Scheduling: I designed and implemented a custom
WarmUpCosineDecayReduceLROnPlateaucallback. This sophisticated scheduler:- Warms up the learning rate for the first 10% of steps to prevent initial instability.
- Applies a Cosine Decay for smooth convergence towards the minimum.
- Integrates a Reduce on Plateau mechanism to cut the learning rate if validation loss stagnates after the initial decay phase is complete.
- Callbacks: Training was monitored using
EarlyStoppingto halt training when performance plateaus andModelCheckpointto save only the best-performing model based on validation loss.
Beyond standard accuracy metrics, I explored the model's reliability through calibration and uncertainty estimation.
- Uncertainty Quantification: Implemented Monte Carlo Dropout, a Bayesian approximation technique, to estimate model uncertainty. By performing multiple forward passes with dropout enabled at inference time, I could calculate the standard deviation of predictions as a measure of the model's confidence.
- Confidence Calibration: Used Temperature Scaling, a post-processing technique, to calibrate the model's output probabilities. An optimal temperature was found by minimizing the loss on the validation set, aligning the model's confidence scores more closely with its actual accuracy. This was visualized using a Reliability Diagram.
- Frameworks: TensorFlow, Keras
- Libraries: NumPy, Scikit-learn, Matplotlib, Seaborn
- Tools: Jupyter Notebook
The project utilizes the CIFAR-10 dataset, a standard benchmark for image classification.
- Content: 60,000 color images of size 32x32 pixels.
- Classes: 10 distinct classes (e.g., airplane, automobile, bird, cat).
- Split: The data was custom-split into training (72.25%), validation (12.75%), and test (15%) sets to ensure robust evaluation.
- Clone the repository:
git clone https://github.com/imehranasgari/your-repo-name.git cd your-repo-name - Create a virtual environment and install dependencies:
Note:
python -m venv venv source venv/bin/activate # On Windows, use `venv\Scripts\activate` pip install -r requirements.txt
requirements.txtshould contain:tensorflow numpy scikit-learn matplotlib seaborn - Run the Jupyter Notebook:
Execute the cells sequentially to load data, build, train, and evaluate the model.
jupyter notebook project.ipynb
The final custom-built model demonstrated strong performance and well-calibrated confidence.
- Test Accuracy: 85.28%
- Test Loss: 0.6719
- Best Validation Loss: 0.6831 (achieved at epoch 29)
- Training Time: 5024 seconds (~84 minutes) on an NVIDIA GeForce RTX 3070 Ti Laptop GPU.
- Model Calibration: The Reliability Diagram shows that Temperature Scaling successfully improved the model's calibration, making its confidence scores more indicative of its true accuracy.
The following plots are generated by the notebook and illustrate the model's training progression and final performance.
This project was a deep dive into the practical aspects of building and optimizing a production-ready CNN.
- Custom Callbacks: Implementing the
WarmUpCosineDecayReduceLROnPlateauscheduler was a valuable exercise in controlling training dynamics at a granular level. It highlighted the significant impact that a well-designed learning rate schedule has on achieving stable and optimal convergence. - Model Reliability: Exploring MC Dropout and Temperature Scaling underscored the importance of not just accuracy, but also model calibration and uncertainty awareness. A model that "knows when it doesn't know" is far more useful in real-world applications.
- Exploratory Models: While the main focus was the custom CNN, I also experimented with a pre-trained ResNet50 model. This exploration, though not fully trained to completion in the notebook, demonstrated my understanding of transfer learning principles and different fine-tuning strategies (feature extraction vs. partial unfreezing).
💡 Some interactive outputs (e.g., plots, widgets) may not display correctly on GitHub. If so, please view this notebook via nbviewer.org for full rendering.
- Email: imehranasgari@gmail.com
- GitHub: https://github.com/imehranasgari
This project is licensed under the Apache 2.0 License – see the LICENSE file for details.