Skip to content

sentilabs01/shine_training_data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Shine Synthetic Face Dataset

AI-generated skin condition images for training computer vision models, with comprehensive metadata for machine learning applications.

Overview

This project generates diverse synthetic skin condition images using Google's Gemini 2.0 Flash image generation model. Each image is paired with detailed JSON annotations containing training-ready metadata for ML model development.

🎯 Key Features

Enhanced Data Generation

  • 10 Images Per Condition/Severity: Generates 10 diverse faces for each of the 21 condition/severity combinations
  • Total Dataset: 210 images (7 conditions Γ— 3 severities Γ— 10 images)
  • Demographic Diversity: Age groups, skin tones, and gender variations
  • Training-Ready Metadata: Comprehensive annotations for ML model training

6 Skin Condition Categories

  • Acne (mild/moderate/severe)
  • Fine Lines & Wrinkles (mild/moderate/severe)
  • Aging (early_signs/moderate/advanced)
  • Hyperpigmentation (mild/moderate/severe)
  • Textured Skin (slight/moderate/severe)
  • Redness (mild/moderate/severe)
  • Pore Size (mild/moderate/severe)

Advanced Metadata Structure

Each image includes comprehensive metadata for ML training:

  • Multi-label Classification Targets: Boolean flags for all 6 conditions
  • Severity Targets: Specific severity level for each condition
  • Demographics: Age group, skin tone, gender information
  • Quality Metrics: Image resolution, lighting, pose quality
  • Training Annotations: Confidence scores, affected percentages, feature counts

πŸ“ Generated Dataset Structure

output_images/
β”œβ”€β”€ dataset_summary_TIMESTAMP.json
β”œβ”€β”€ acne/
β”‚   β”œβ”€β”€ mild/
β”‚   β”‚   β”œβ”€β”€ acne_mild_TIMESTAMP_0000.png
β”‚   β”‚   β”œβ”€β”€ acne_mild_TIMESTAMP_0000.json
β”‚   β”‚   └── ... (10 images total)
β”‚   β”œβ”€β”€ moderate/
β”‚   └── severe/
β”œβ”€β”€ fine_lines_wrinkles/
β”œβ”€β”€ aging/
β”œβ”€β”€ hyperpigmentation/
β”œβ”€β”€ textured_skin/
β”œβ”€β”€ redness/
└── pore_size/

πŸš€ Quick Start

Prerequisites

  • Python 3.8+
  • Google API key with Gemini access

Installation

  1. Clone the repository:
git clone https://github.com/sentilabs01/shine_training_data.git
cd shine_training_data
  1. Create virtual environment:
python -m venv .venv
.venv\Scripts\activate  # Windows
# or
source .venv/bin/activate  # Linux/Mac
  1. Install dependencies:
pip install -U google-generativeai pillow requests
  1. Set environment variables:
# Required
set GOOGLE_API_KEY=your_api_key_here

# Optional
set IMAGES_PER_CONDITION=10
set OUTPUT_DIR=./output_images
set GENERATION_DELAY_SECONDS=30

Usage

Generate Full Dataset (210 images)

python generate_skin_images_enhanced.py

Test Generation (42 images)

python test_generation.py

Custom Configuration

# Generate 5 images per condition/severity
export IMAGES_PER_CONDITION=5

# Use custom output directory
export OUTPUT_DIR="./my_dataset"

# Reduce delay between API calls
export GENERATION_DELAY_SECONDS=15

python generate_skin_images_enhanced.py

πŸ“Š Metadata Format

Each image is paired with a comprehensive JSON annotation file:

{
    "image_filename": "acne_mild_20250910_162703_0000.png",
    "skin_condition": "acne",
    "severity": "mild",
    "classification_targets": {
        "acne": true,
        "fine_lines_wrinkles": false,
        "aging": false,
        "hyperpigmentation": false,
        "textured_skin": false,
        "redness": false,
        "pore_size": false
    },
    "severity_targets": {
        "acne": "mild",
        "fine_lines_wrinkles": "none",
        "aging": "none",
        "hyperpigmentation": "none",
        "textured_skin": "none",
        "redness": "none",
        "pore_size": "none"
    },
    "demographics": {
        "age_group": "young_adult",
        "age_range": "18-25",
        "skin_tone": "very fair skin tone",
        "gender": "male"
    },
    "training_annotations": {
        "condition_detected": true,
        "severity_level": "mild",
        "confidence_score": 0.8,
        "affected_percentage": 0.05,
        "feature_count": 5
    }
}

πŸ”§ Environment Variables

Variable Default Description
GOOGLE_API_KEY Required Google API key for Gemini
IMAGES_PER_CONDITION 10 Number of images per condition/severity
OUTPUT_DIR ./output_images Output directory
GENERATION_DELAY_SECONDS 30 Delay between API calls
GEMINI_IMAGE_MODEL models/gemini-2.0-flash-preview-image-generation Model to use

πŸ’° Cost Estimation

  • Images per full run: 210 (7 conditions Γ— 3 severities Γ— 10 images)
  • Cost per image: ~$0.03 (Google Gemini pricing)
  • Total cost per run: ~$6.30
  • Test run cost: ~$1.26 (42 images)

πŸ“ˆ Training Integration

The generated metadata is structured for easy integration with ML training pipelines:

  1. Multi-label Classification: Use classification_targets for training
  2. Severity Prediction: Use severity_targets for training
  3. Demographic Analysis: Use demographics for bias analysis
  4. Quality Control: Use quality_metrics for filtering

πŸ“š Documentation

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Submit a pull request

πŸ“„ License

This project is open source. Please check the license file for details.

πŸ†˜ Support

For issues and questions, please open an issue on GitHub or contact the maintainers.


Developer Information

Developed by: @sentilabs01
Website: shineskincollective.com

This synthetic data generation tool is part of the Shine Skincare App project, designed to create diverse, training-ready datasets for AI-powered skin condition analysis.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages