Distinguishing real photographs from AI-generated images using state-of-the-art deep learning architectures. This project implements both custom Vision Transformers and transfer learning with ResNet50V2, achieving up to 95.43% accuracy on a balanced dataset of 200,000 images.
- Features
- Model Performance
- Demo
- Dataset
- Project Structure
- Installation
- Usage
- Model Architectures
- Results
- Methodology
- Notebooks
- Contributing
- Citation
- License
- Acknowledgments
- Vision Transformer (ViT): Custom implementation from scratch with patch embeddings and multi-head attention
- ResNet50V2: Transfer learning from ImageNet pre-trained weights with custom classification head
- Detailed confusion matrices and classification reports
- Training history visualization (loss, accuracy, precision, recall, AUC)
- Per-class performance metrics
- Model comparison and benchmarking
- Efficient TensorFlow data loading with prefetching
- Automated data splitting (train/val/test)
- Model checkpointing and early stopping
- Learning rate scheduling
- Multiple metrics: Accuracy, Precision, Recall, F1-Score, AUC-ROC
- Balanced test set evaluation (15,000 images per class)
- Specificity and sensitivity analysis
| Model | Test Accuracy | Test Precision | Test Recall | Test AUC | Parameters | Training Time/Epoch |
|---|---|---|---|---|---|---|
| ResNet50V2 | 95.43% | 95.34% | 95.53% | 99.03% | 558k (trainable) | ~12 min |
| Vision Transformer | 91.14% | 92.04% | 90.07% | 97.20% | 28.9M | ~43 min |
- Overall Accuracy: 95.43%
- Precision: 95.34% (low false positive rate)
- Recall: 95.53% (high detection rate)
- F1-Score: 95.43%
- AUC-ROC: 99.03% (excellent class separation)
- Specificity: 95.33%
- Test Loss: 0.1222
| Predicted β | Fake | Real |
|---|---|---|
| Actual Fake | 14,299 (TN) | 701 (FP) |
| Actual Real | 671 (FN) | 14,329 (TP) |
Total Errors: 1,372 / 30,000 = 4.57% error rate
- Overall Accuracy: 91.14%
- Precision: 92.04%
- Recall: 90.07%
- F1-Score: 91.04%
- AUC-ROC: 97.20%
- Test Loss: 0.2198
# Load trained model
model = load_model('ResNet_best_model.keras')
# Predict on new image
image = load_and_preprocess_image('suspicious_image.jpg')
prediction = model.predict(image)
if prediction > 0.5:
print(f"REAL IMAGE (Confidence: {prediction[0][0]*100:.2f}%)")
else:
print(f"FAKE IMAGE (Confidence: {(1-prediction[0][0])*100:.2f}%)")- β Real photo detection: 95.53% success rate
- β Fake image detection: 95.33% success rate
- β Balanced performance: No bias toward either class
This project uses a carefully curated dataset combining real and AI-generated images:
| Dataset | Source | Type | Count |
|---|---|---|---|
| COCO 2017 | Microsoft | Real photographs | 100,000 |
| DiffusionDB | Part 0001-0100 | AI-generated (diffusion models) | 100,000 |
| Total | - | Balanced binary classification | 200,000 |
| Split | Fake Images | Real Images | Total | Percentage |
|---|---|---|---|---|
| Training | 70,000 | 70,000 | 140,000 | 70% |
| Validation | 15,000 | 15,000 | 30,000 | 15% |
| Testing | 15,000 | 15,000 | 30,000 | 15% |
-
COCO 2017 (Real Images)
- Kaggle Dataset
- Real-world photographs spanning 80+ object categories
- Natural scenes, people, animals, objects
- High-quality, professionally captured images
-
DiffusionDB (Fake Images)
- Kaggle Dataset
- AI-generated images from Stable Diffusion and other diffusion models
- Diverse prompts and styles
- Represents state-of-the-art generative AI output
All images undergo standardized preprocessing:
- Resize: 224 Γ 224 pixels (standard for ResNet and ViT)
- Normalization: Pixel values scaled from [0, 255] to [0, 1]
- Format: RGB (3 channels)
- Batch Size: 32 images per batch
- Data Type: Float32
AI-GENERATED-IMAGE-DETECTION/
β
βββ notebooks/
β βββ vision_transformer.ipynb # Custom ViT implementation
β βββ resnet50v2_transfer.ipynb # ResNet transfer learning
β
β
βββ models/
β βββ vit_best_model.keras # Best ViT checkpoint
β βββ ResNet_best_model.keras # Best ResNet checkpoint
β
β
βββ results/
β βββ vit_results/
β β βββ confusion_matrix.png
β β βββ training_history.png
β β βββ classification_report.txt
β βββ resnet_results/
β β βββ confusion_matrix.png
β β βββ training_history.png
β β βββ classification_report.txt
β βββ Data_Visualization/
β βββ fake_images_1.png
β βββ fake_images_2.png
β βββ real_images_1.png
β βββ real_images_2.png
β βββ count.png
β
βββ data/
β βββ train/
β β βββ fake/ # Symbolic links to training fake images
β β βββ real/ # Symbolic links to training real images
β βββ val/
β β βββ fake/
β β βββ real/
β βββ test/
β βββ fake/
β βββ real/
β
βββ src/
β βββ data_preprocessing.py # Data loading and preprocessing
β βββ model_training.py # Training utilities
β βββ evaluation.py # Evaluation metrics
β βββ visualization.py # Plotting functions
β
βββ requirements.txt # Python dependencies
βββ environment.yml # Conda environment
βββ README.md # This file
βββ LICENSE # MIT License
βββ .gitignore # Git ignore rules
- Python 3.10+
- CUDA-compatible GPU (recommended, 8GB+ VRAM)
- 16GB+ RAM
- 50GB+ free disk space
# Clone repository
git clone https://github.com/Abdelhady-22/AI-Generated-vs-Real-Image-Detection.git
cd AI-Generated-vs-Real-Image-Detection
# Create conda environment
conda env create -f environment.yml
conda activate fake-image-detection
# Verify installation
python -c "import tensorflow as tf; print(f'TensorFlow {tf.__version__} | GPU:', tf.config.list_physical_devices('GPU'))"# Clone repository
git clone https://github.com/Abdelhady-22/AI-Generated-vs-Real-Image-Detection.git
cd AI-Generated-vs-Real-Image-Detection
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt# Install kagglehub
pip install kagglehub
# Download datasets programmatically
python << EOF
import kagglehub
# Download COCO 2017
coco_path = kagglehub.dataset_download('awsaf49/coco-2017-dataset')
print(f'COCO downloaded to: {coco_path}')
# Download DiffusionDB
diffusion_path = kagglehub.dataset_download('ammarali32/diffusiondb-2m-part-0001-to-0100-of-2000')
print(f'DiffusionDB downloaded to: {diffusion_path}')
EOF- Download COCO 2017: https://www.kaggle.com/datasets/awsaf49/coco-2017-dataset
- Download DiffusionDB: https://www.kaggle.com/datasets/ammarali32/diffusiondb-2m-part-0001-to-0100-of-2000
- Extract to
data/raw/directory
# Train ResNet50V2 (recommended)
python notebooks/resnet50v2_transfer.ipynb
# Or train Vision Transformer
python notebooks/vision_transformer.ipynbimport os
from pathlib import Path
# Define paths
CLASS0_DIR = "data/raw/diffusiondb/" # Fake images
CLASS1_DIR = "data/raw/coco2017/train2017/" # Real images
OUTPUT_DIR = "data/processed/"
# Create train/val/test splits
from src.data_preprocessing import create_splits
create_splits(
fake_dir=CLASS0_DIR,
real_dir=CLASS1_DIR,
output_dir=OUTPUT_DIR,
train_size=70000,
val_size=15000,
test_size=15000,
seed=42
)import tensorflow as tf
from tensorflow.keras.applications import ResNet50V2
from tensorflow.keras import layers, Model
# Load data
train_ds = tf.keras.preprocessing.image_dataset_from_directory(
'data/processed/train',
image_size=(224, 224),
batch_size=32,
label_mode='binary'
)
# Build model
base_model = ResNet50V2(weights='imagenet', include_top=False, input_shape=(224, 224, 3))
base_model.trainable = False
model = tf.keras.Sequential([
base_model,
layers.GlobalAveragePooling2D(),
layers.Dense(256, activation='relu'),
layers.BatchNormalization(),
layers.Dropout(0.5),
layers.Dense(128, activation='relu'),
layers.BatchNormalization(),
layers.Dropout(0.3),
layers.Dense(1, activation='sigmoid')
])
# Compile
model.compile(
optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy', 'precision', 'recall', tf.keras.metrics.AUC()]
)
# Train
history = model.fit(
train_ds,
validation_data=val_ds,
epochs=20,
callbacks=[
tf.keras.callbacks.EarlyStopping(patience=3),
tf.keras.callbacks.ModelCheckpoint('models/best_model.keras')
]
)# Load test data
test_ds = tf.keras.preprocessing.image_dataset_from_directory(
'data/processed/test',
image_size=(224, 224),
batch_size=32,
label_mode='binary'
)
# Evaluate
results = model.evaluate(test_ds)
print(f"Test Accuracy: {results[1]*100:.2f}%")
# Generate predictions
from src.evaluation import evaluate_model
evaluate_model(model, test_ds, save_path='results/')from tensorflow.keras.models import load_model
from PIL import Image
import numpy as np
# Load trained model
model = load_model('models/ResNet_best_model.keras')
def predict_image(image_path, threshold=0.5):
"""
Predict if an image is real or AI-generated.
Args:
image_path: Path to image file
threshold: Classification threshold (default 0.5)
Returns:
dict: Prediction results
"""
# Load and preprocess
img = Image.open(image_path).convert('RGB')
img = img.resize((224, 224))
img_array = np.array(img) / 255.0
img_array = np.expand_dims(img_array, axis=0)
# Predict
prediction = model.predict(img_array, verbose=0)[0][0]
# Interpret
is_real = prediction > threshold
confidence = prediction if is_real else (1 - prediction)
return {
'prediction': 'REAL' if is_real else 'FAKE',
'confidence': confidence * 100,
'raw_score': prediction
}
# Example usage
result = predict_image('test_image.jpg')
print(f"{result['prediction']} (Confidence: {result['confidence']:.2f}%)")Input (224, 224, 3)
β
ββββββββββββββββββββββββββββββββ
β ResNet50V2 Base (Frozen) β
β - Pre-trained on ImageNet β
β - 23.5M parameters β
β - Feature extraction β
ββββββββββββββββββββββββββββββββ
β
GlobalAveragePooling2D β (2048,)
β
Dense(256, relu) β BatchNorm β Dropout(0.5)
β
Dense(128, relu) β BatchNorm β Dropout(0.3)
β
Dense(1, sigmoid) β [0, 1]
β
Output: 0 = Fake, 1 = Real
Key Features:
- Transfer Learning: Leverages ImageNet knowledge
- Frozen Base: Only trains classification head (558k params)
- Regularization: BatchNorm + Dropout prevents overfitting
- Efficiency: 3Γ faster training than ViT
Input Image (224, 224, 3)
β
ββββββββββββββββββββββββββββββββ
β Patch Embedding (16Γ16) β
β β 196 patches Γ 384 dim β
ββββββββββββββββββββββββββββββββ
β
CLS Token + Positional Encoding
β
ββββββββββββββββββββββββββββββββ
β Transformer Encoder Γ 6 β
β ββββββββββββββββββββββββ β
β β Layer Normalization β β
β β Multi-Head Attention β β
β β (6 heads) β β
β β Residual Connection β β
β ββββββββββββββββββββββββ€ β
β β Layer Normalization β β
β β MLP (384β1536β384) β β
β β Residual Connection β β
β ββββββββββββββββββββββββ β
ββββββββββββββββββββββββββββββββ
β
Extract CLS Token β (384,)
β
Dense(512, gelu) β Dropout(0.3)
β
Dense(1, sigmoid) β [0, 1]
Key Features:
- Attention Mechanism: Learns spatial relationships
- Patch-Based: Processes 16Γ16 image patches
- Deep Architecture: 6 transformer blocks
- Large Capacity: 28.9M parameters
- Convergence: Best model at epoch 12
- Training Time: ~3.2 hours (15 epochs)
- Validation Accuracy: 95.32% (peak)
- Minimal Overfitting: Train-val gap < 0.5%
- Convergence: Best model at epoch 10
- Training Time: ~9 hours (13 epochs)
- Validation Accuracy: 91.41% (peak)
- Slight Overfitting: Train-val gap ~1.5%
| Metric | ResNet50V2 | ViT | Winner |
|---|---|---|---|
| Accuracy | 95.43% | 91.14% | ResNet π |
| Precision | 95.34% | 92.04% | ResNet π |
| Recall | 95.53% | 90.07% | ResNet π |
| F1-Score | 95.43% | 91.04% | ResNet π |
| AUC-ROC | 99.03% | 97.20% | ResNet π |
| Specificity | 95.33% | 92.21% | ResNet π |
| Training Speed | 12 min/epoch | 43 min/epoch | ResNet π |
| Parameters | 24.1M (2.3% trainable) | 28.9M (100% trainable) | ResNet π |
- Transfer Learning Dominates: ResNet50V2 outperforms custom ViT by 4.3% accuracy
- Efficiency Matters: ResNet trains 3Γ faster with fewer parameters
- Balanced Performance: Both models show minimal class bias
- Excellent Generalization: High validation β test consistency
- Production Ready: ResNet achieves 95%+ accuracy with fast inference
- Real Images: 100k from COCO 2017 (natural photographs)
- Fake Images: 100k from DiffusionDB (AI-generated)
- Balanced Split: 70k train / 15k val / 15k test per class
- Resize: All images β 224Γ224 pixels
- Normalization: Pixel values [0, 255] β [0, 1]
- Batching: Groups of 32 images
- Prefetching: Overlap data loading with training
-
Transfer Learning (ResNet):
- Freeze pre-trained ImageNet weights
- Train only custom classification head
- Fine-tune learning rate: 1e-4
-
From Scratch (ViT):
- Random weight initialization
- Full model training
- Learning rate: 1e-4
- Optimizer: Adam with default parameters
- Loss Function: Binary cross-entropy
- Callbacks:
- Early stopping (patience=3)
- Model checkpointing (save best)
- Learning rate reduction (factor=0.5, patience=2)
- Metrics: Accuracy, Precision, Recall, F1, AUC
- Test Set: 30,000 unseen images
- Confusion Matrix: Detailed error analysis
- Cross-Validation: Consistent train-val-test splits
File: notebooks/vision_transformer.ipynb
Contents:
- Custom ViT architecture from scratch
- Patch embedding and positional encoding
- Multi-head self-attention mechanisms
- Training on 200k images
- Comprehensive evaluation
Key Results: 91.14% test accuracy, 97.20% AUC
File: notebooks/resnet50v2_transfer.ipynb
Contents:
- Transfer learning from ImageNet
- Custom classification head design
- Efficient training (3Γ faster than ViT)
- Superior performance metrics
Key Results: 95.43% test accuracy, 99.03% AUC
Contributions are welcome! Please follow these guidelines:
- Fork the repository
git clone https://github.com/Abdelhady-22/AI-Generated-vs-Real-Image-Detection.git
cd AI-Generated-vs-Real-Image-Detection- Create a feature branch
git checkout -b feature/amazing-feature-
Make your changes
- Add new models or improve existing ones
- Enhance documentation
- Fix bugs or optimize code
-
Run tests
python -m pytest tests/- Commit your changes
git commit -m "Add amazing feature"- Push to your fork
git push origin feature/amazing-feature- Open a Pull Request
- π― Implement additional architectures (EfficientNet, ConvNeXt, Swin Transformer)
- π Add ensemble methods for improved accuracy
- π Implement Grad-CAM for model interpretability
- π± Create web interface for easy inference
- π Add support for video deepfake detection
- π Extend to multi-class fake image detection
- π§ͺ Add unit tests and integration tests
- π Improve documentation and tutorials
If you use this project in your research or work, please cite:
@software{ai_generated_image_detection_2024,
author = {Abdelhady Ali Mohamed},
title = {AI-Generated Image Detection using Deep Learning},
year = {2025},
publisher = {GitHub},
url = {https://github.com/Abdelhady-22/AI-Generated-vs-Real-Image-Detection},
note = {ResNet50V2 Transfer Learning achieving 95.43\% accuracy}
}- Dosovitskiy et al. (2020). "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale" - Vision Transformer foundation
- He et al. (2016). "Deep Residual Learning for Image Recognition" - ResNet architecture
- Rombach et al. (2022). "High-Resolution Image Synthesis with Latent Diffusion Models" - Stable Diffusion background
This project is licensed under the Apache License - see the Apache file for details.
Apache License
Copyright (c) 2025 [Abdelhady Ali]
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
[Full MIT License text...]
- COCO 2017: Microsoft COCO Team for high-quality real image dataset
- DiffusionDB: Researchers at UC Berkeley for AI-generated image dataset
- TensorFlow/Keras: Deep learning framework
- Kaggle: Computational resources and platform
- scikit-learn: Machine learning utilities
- OpenCV: Image processing library
- Vision Transformer paper by Google Research
- ResNet architecture by Microsoft Research
- AI safety research community
- Stack Overflow and GitHub communities
- Kaggle discussion forums
- TensorFlow documentation contributors
Author: [Abdelhady Ali]
- π§ Email: abdulhadi2322005@gmail.com
- πΌ LinkedIn: My LinkedIn
- π GitHub: MY username
- π Bug Reports: GitHub Issues
- π‘ Feature Requests: GitHub Discussions
- π§ General Inquiries: abdulhadi2322005@gmail.com
This project is intended for educational and research purposes only. The models are designed to detect AI-generated images but should not be used as the sole basis for:
- Legal proceedings or evidence authentication
- Journalistic verification without additional fact-checking
- Medical or scientific image validation
- Any decision with significant consequences
Important Notes:
- Model performance may vary on images from newer generative models
- Adversarial attacks can fool detection systems
- Always combine automated detection with human expertise
- Regular model updates are needed as generative AI evolves
- Add EfficientNetV2 and ConvNeXt models
- Implement Grad-CAM visualization
- Create Flask/FastAPI web interface
- Add Docker containerization
- Improve documentation with video tutorials
- Ensemble multiple models for 96%+ accuracy
- Support for video deepfake detection
- Multi-class detection (identify generation method)
- Mobile deployment (TensorFlow Lite)
- Continuous learning pipeline
- Real-time browser extension
- Integration with social media platforms
- Advanced adversarial robustness
- Multi-modal detection (image + metadata)
- Research paper publication
- Total Images Processed: 200,000
- Training Examples: 140,000
- Test Examples: 30,000
- Model Parameters: 24-29 million
- Training Time: 3-9 hours (GPU)
- Inference Speed: ~50 images/second (GPU)
- Project Stars: β (Star this repo!)
π Star this repository if you find it helpful! π
Made with β€οΈ for AI safety and transparency