This project builds and trains a deep learning model to predict age and gender from face images using the UTKFace dataset.
The pipeline includes dataset handling, preprocessing, custom data generation, model building (using transfer learning), training, and evaluation.
The project uses the UTKFace dataset, which contains over 20,000 labeled face images.
Each filename encodes the following information: age_gender_race_date.jpg
This project extracts only:
- Age — integer
- Gender — 0 = male, 1 = female
The dataset is downloaded automatically from Kaggle.
- Extract image files from the Kaggle ZIP archive.
- Parse labels from filenames.
- Resize images to 128×128.
- Normalize pixel values to [0, 1].
- Use a custom Keras Sequence to efficiently load data during training.
A train-validation split is performed using train_test_split.
A subclass of keras.utils.Sequence is used to:
- Load images batch-by-batch.
- Apply preprocessing.
- Shuffle training data each epoch.
- Output:
- X = preprocessed face images
- y_age = age (regression)
- y_gender = gender (binary classification)
The model uses MobileNetV2 (pretrained on ImageNet) as the base feature extractor.
The top layers include:
- GlobalAveragePooling
- Dropout
- Dense layers
- Two output heads:
- Age output → regression layer
- Gender output → sigmoid classifier
The model is compiled with:
mseloss for agebinary_crossentropyfor genderAdamoptimizer
The model is trained with:
- Batch size: 32
- Image size: 128×128
- Validation split defined earlier
During training:
- Losses for both outputs are tracked
- Combined loss is optimized
After training, the model is evaluated on the validation dataset for:
- Age prediction error (MSE)
- Gender classification accuracy
- Install dependencies: pip install tensorflow numpy opencv-python scikit-learn kaggle
- Place your
kaggle.jsonAPI key in the working directory. - Run the notebook cells in order.