AI-generated skin condition images for training computer vision models, with comprehensive metadata for machine learning applications.
This project generates diverse synthetic skin condition images using Google's Gemini 2.0 Flash image generation model. Each image is paired with detailed JSON annotations containing training-ready metadata for ML model development.
- 10 Images Per Condition/Severity: Generates 10 diverse faces for each of the 21 condition/severity combinations
- Total Dataset: 210 images (7 conditions Γ 3 severities Γ 10 images)
- Demographic Diversity: Age groups, skin tones, and gender variations
- Training-Ready Metadata: Comprehensive annotations for ML model training
- Acne (mild/moderate/severe)
- Fine Lines & Wrinkles (mild/moderate/severe)
- Aging (early_signs/moderate/advanced)
- Hyperpigmentation (mild/moderate/severe)
- Textured Skin (slight/moderate/severe)
- Redness (mild/moderate/severe)
- Pore Size (mild/moderate/severe)
Each image includes comprehensive metadata for ML training:
- Multi-label Classification Targets: Boolean flags for all 6 conditions
- Severity Targets: Specific severity level for each condition
- Demographics: Age group, skin tone, gender information
- Quality Metrics: Image resolution, lighting, pose quality
- Training Annotations: Confidence scores, affected percentages, feature counts
output_images/
βββ dataset_summary_TIMESTAMP.json
βββ acne/
β βββ mild/
β β βββ acne_mild_TIMESTAMP_0000.png
β β βββ acne_mild_TIMESTAMP_0000.json
β β βββ ... (10 images total)
β βββ moderate/
β βββ severe/
βββ fine_lines_wrinkles/
βββ aging/
βββ hyperpigmentation/
βββ textured_skin/
βββ redness/
βββ pore_size/
- Python 3.8+
- Google API key with Gemini access
- Clone the repository:
git clone https://github.com/sentilabs01/shine_training_data.git
cd shine_training_data- Create virtual environment:
python -m venv .venv
.venv\Scripts\activate # Windows
# or
source .venv/bin/activate # Linux/Mac- Install dependencies:
pip install -U google-generativeai pillow requests- Set environment variables:
# Required
set GOOGLE_API_KEY=your_api_key_here
# Optional
set IMAGES_PER_CONDITION=10
set OUTPUT_DIR=./output_images
set GENERATION_DELAY_SECONDS=30python generate_skin_images_enhanced.pypython test_generation.py# Generate 5 images per condition/severity
export IMAGES_PER_CONDITION=5
# Use custom output directory
export OUTPUT_DIR="./my_dataset"
# Reduce delay between API calls
export GENERATION_DELAY_SECONDS=15
python generate_skin_images_enhanced.pyEach image is paired with a comprehensive JSON annotation file:
{
"image_filename": "acne_mild_20250910_162703_0000.png",
"skin_condition": "acne",
"severity": "mild",
"classification_targets": {
"acne": true,
"fine_lines_wrinkles": false,
"aging": false,
"hyperpigmentation": false,
"textured_skin": false,
"redness": false,
"pore_size": false
},
"severity_targets": {
"acne": "mild",
"fine_lines_wrinkles": "none",
"aging": "none",
"hyperpigmentation": "none",
"textured_skin": "none",
"redness": "none",
"pore_size": "none"
},
"demographics": {
"age_group": "young_adult",
"age_range": "18-25",
"skin_tone": "very fair skin tone",
"gender": "male"
},
"training_annotations": {
"condition_detected": true,
"severity_level": "mild",
"confidence_score": 0.8,
"affected_percentage": 0.05,
"feature_count": 5
}
}| Variable | Default | Description |
|---|---|---|
GOOGLE_API_KEY |
Required | Google API key for Gemini |
IMAGES_PER_CONDITION |
10 | Number of images per condition/severity |
OUTPUT_DIR |
./output_images |
Output directory |
GENERATION_DELAY_SECONDS |
30 | Delay between API calls |
GEMINI_IMAGE_MODEL |
models/gemini-2.0-flash-preview-image-generation |
Model to use |
- Images per full run: 210 (7 conditions Γ 3 severities Γ 10 images)
- Cost per image: ~$0.03 (Google Gemini pricing)
- Total cost per run: ~$6.30
- Test run cost: ~$1.26 (42 images)
The generated metadata is structured for easy integration with ML training pipelines:
- Multi-label Classification: Use
classification_targetsfor training - Severity Prediction: Use
severity_targetsfor training - Demographic Analysis: Use
demographicsfor bias analysis - Quality Control: Use
quality_metricsfor filtering
- Enhanced Generation Guide: Detailed documentation for the enhanced generation script
- Image Generation Strategy: Strategic approach to synthetic data generation
- ML Architecture Analysis: Comprehensive analysis and recommendations
- Fork the repository
- Create a feature branch
- Make your changes
- Submit a pull request
This project is open source. Please check the license file for details.
For issues and questions, please open an issue on GitHub or contact the maintainers.
Developed by: @sentilabs01
Website: shineskincollective.com
This synthetic data generation tool is part of the Shine Skincare App project, designed to create diverse, training-ready datasets for AI-powered skin condition analysis.