Skip to content

WesleyJunkins/PyTorch-Training-Framework

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PyTorch Training Framework

A comprehensive, flexible PyTorch training framework that supports multiple neural network architectures, optimizers, loss functions, and data loading methods. Configure everything through JSON files without modifying code, or use the genetic algorithm extension to evolve neural network populations.

Table of Contents

Overview

This framework provides a complete solution for training PyTorch neural networks with:

  • Multiple Architectures: SimpleNet/MLP, CNN, RNN, LSTM, GRU, Transformer
  • Comprehensive Optimizers: Adam, SGD, AdamW, RMSprop, and many more
  • Flexible Data Loading: NumPy files, CSV files, custom functions, or dummy data
  • JSON Configuration: Configure everything via JSON files - no code changes needed
  • Genetic Algorithm Extension: Evolve populations of neural networks
  • Built-in Logging & Metrics: Track training progress with comprehensive logging
  • Model Management: Save/load models with flexible checkpointing

Features

Core Training Framework

  • Support for all major PyTorch neural network architectures
  • All PyTorch optimizers and loss functions available
  • Comprehensive metrics tracking (MAE, R², RMSE, etc.)
  • Early stopping support
  • Automatic train/test splitting
  • Data normalization options
  • GPU/CPU support with automatic device detection

JSON Configuration System

  • Configure all training parameters via JSON files
  • No code modifications required
  • Run multiple configurations in batch
  • Custom model naming for organized outputs

Genetic Algorithm Extension

  • Evolve populations of neural networks
  • Multiple selection methods (tournament, roulette, rank-based)
  • Mutation and crossover operations
  • Fitness-based evolution with configurable metrics
  • Generation tracking and statistics

Data Loading

  • Dummy Data: Quick testing with generated data
  • NumPy Files: Load from .npy files
  • CSV Files: Load from CSV with column selection
  • Custom Functions: Use your own data loading functions

Quick Start

Installation

Option 1: Automated Setup (Recommended)

For Mac:

cd "PyTorch Training Framework"
source setup/setup.sh

For Linux:

cd "PyTorch Training Framework"
 source setup/setup.sh

For Windows:

cd "PyTorch Training Framework"
setup\setup.bat

The setup script will:

  • Create a virtual environment (venv)
  • Activate the virtual environment (for the current session)
  • Install all dependencies from requirements.txt
  • Create necessary directories (configs, generations, logs, models)

Note: After running the setup script, the virtual environment will be activated in that terminal session. For future sessions, you'll need to activate it manually:

source venv/bin/activate  # On Windows: venv\Scripts\activate

Option 2: Manual Setup

  1. Clone or download this repository

  2. Create a virtual environment (recommended):

    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
  3. Install dependencies:

    pip install -r requirements.txt
  4. Create necessary directories:

    mkdir -p configs generations logs models

    On Windows:

    mkdir configs generations logs models

Run Your First Training

The fastest way to get started is with an example configuration:

python run_train.py example_configs/config_example.json

This will:

  • Load a simple neural network configuration
  • Generate dummy data for testing
  • Train the model
  • Save checkpoints to models/
  • Create logs in logs/

Getting Started with Examples

The example_configs/ directory contains several example configurations to help you understand the framework:

1. Basic Example

python run_train.py example_configs/config_example.json
  • Complete configuration with all options
  • Uses dummy data for quick testing
  • Simple feedforward network (SimpleNet)

2. NumPy Data Example

python run_train.py example_configs/config_numpy_data.json
  • Loads data from NumPy files
  • Shows how to configure data paths

3. CSV Data Example

python run_train.py example_configs/config_csv_data.json
  • Loads data from CSV files
  • Demonstrates column selection

4. LSTM Example

python run_train.py example_configs/config_lstm.json
  • Configures an LSTM network
  • Shows RNN-specific settings

5. Genetic Algorithm Example

python genetic_algorithm.py example_configs/config_genetic.json
  • Evolves a population of neural networks
  • Demonstrates genetic algorithm features

Understanding the Examples

Each example configuration file demonstrates different aspects:

  • Model Configuration: Architecture, layer sizes, activations
  • Data Loading: Different data sources and formats
  • Training Parameters: Batch size, epochs, learning rate, optimizer
  • Advanced Features: Early stopping, logging, metrics

Tip: Open the example JSON files in a text editor to see all available configuration options with comments and examples.

Creating Your Own Projects

Step 1: Create Your Configuration File

Start by copying an example configuration:

cp example_configs/config_example.json configs/my_project.json

Step 2: Configure Your Model

Edit configs/my_project.json and modify the model section:

{
  "model": {
    "name": "my_custom_model",
    "algorithm": "SimpleNet",
    "input_size": 10,
    "hidden_sizes": [64, 32, 16],
    "output_size": 1,
    "activation": "ReLU"
  }
}

Available Algorithms:

  • SimpleNet - Simple feedforward neural network (same as MLP)
  • MLP - Multi-Layer Perceptron (same as SimpleNet)
  • Linear - Simple linear model (single layer)
  • CNN or ConvNet - Convolutional Neural Network
  • RNN - Recurrent Neural Network
  • LSTM - Long Short-Term Memory network
  • GRU - Gated Recurrent Unit network
  • Transformer - Transformer model with encoder layers

Step 3: Configure Your Data

Choose your data loading method:

Option A: NumPy Files

{
  "data_loading": {
    "type": "numpy",
    "X_path": "data/X_train.npy",
    "y_path": "data/y_train.npy"
  }
}

Option B: CSV File

{
  "data_loading": {
    "type": "csv",
    "csv_path": "data/dataset.csv",
    "X_columns": ["feature1", "feature2", "feature3"],
    "y_column": "target"
  }
}

Option C: Custom Function

{
  "data_loading": {
    "type": "custom",
    "custom_function": "my_module.load_my_data"
  }
}

Your custom function should return (X, y) where:

  • X is a numpy array or torch tensor of shape (n_samples, n_features)
  • y is a numpy array or torch tensor of shape (n_samples,) or (n_samples, n_outputs)

Step 4: Configure Training Parameters

Modify the training section:

{
  "training": {
    "batch_size": 64,
    "epochs": 100,
    "learning_rate": 0.001,
    "optimizer": "Adam",
    "loss_function": "MSELoss"
  }
}

Available Optimizers:

  • Adam - Adaptive Moment Estimation (default)
  • AdamW - Adam with decoupled weight decay
  • SGD - Stochastic Gradient Descent
  • RMSprop - Root Mean Square Propagation
  • Adagrad - Adaptive Gradient Algorithm
  • Adadelta - Adaptive Learning Rate Method
  • Adamax - Adam based on infinity norm
  • ASGD - Averaged Stochastic Gradient Descent
  • LBFGS - Limited-memory BFGS
  • Rprop - Resilient backpropagation
  • RAdam - Rectified Adam
  • NAdam - Nesterov-accelerated Adam
  • SparseAdam - Sparse version of Adam

Available Loss Functions:

Regression Losses:

  • MSELoss - Mean Squared Error (default for regression)
  • L1Loss - Mean Absolute Error
  • SmoothL1Loss - Smooth L1 Loss (Huber Loss variant)
  • HuberLoss - Huber Loss
  • PoissonNLLLoss - Poisson Negative Log Likelihood
  • GaussianNLLLoss - Gaussian Negative Log Likelihood
  • KLDivLoss - Kullback-Leibler Divergence

Classification Losses:

  • CrossEntropyLoss - Cross Entropy Loss (default for classification)
  • BCELoss - Binary Cross Entropy
  • BCEWithLogitsLoss - BCE with logits (more numerically stable)
  • NLLLoss - Negative Log Likelihood
  • MultiLabelMarginLoss - Multi-label margin loss
  • MultiLabelSoftMarginLoss - Multi-label soft margin loss
  • MultiMarginLoss - Multi-class margin loss
  • SoftMarginLoss - Soft margin loss
  • MarginRankingLoss - Margin ranking loss
  • TripletMarginLoss - Triplet margin loss
  • TripletMarginWithDistanceLoss - Triplet margin with custom distance
  • HingeEmbeddingLoss - Hinge embedding loss
  • CTCLoss - Connectionist Temporal Classification loss

Other Losses:

  • CosineEmbeddingLoss - Cosine embedding loss
  • LabelSmoothingCrossEntropy - Cross entropy with label smoothing

Available Activation Functions:

  • ReLU - Rectified Linear Unit (default)
  • ReLU6 - ReLU with max value of 6
  • LeakyReLU - Leaky ReLU (configurable negative slope)
  • PReLU - Parametric ReLU
  • RReLU - Randomized ReLU
  • GELU - Gaussian Error Linear Unit
  • Sigmoid - Sigmoid activation
  • Tanh - Hyperbolic tangent
  • Hardtanh - Hard tanh (configurable min/max)
  • Hardswish - Hard swish activation
  • ELU - Exponential Linear Unit
  • CELU - Continuously Differentiable ELU
  • SELU - Scaled Exponential Linear Unit
  • GLU - Gated Linear Unit
  • SiLU - Sigmoid Linear Unit
  • Mish - Mish activation
  • Softplus - Softplus activation
  • Softshrink - Soft shrinkage
  • Hardshrink - Hard shrinkage
  • Softsign - Soft sign activation
  • Tanhshrink - Tanh shrinkage
  • Hardsigmoid - Hard sigmoid
  • LogSigmoid - Log sigmoid
  • Softmax - Softmax (for multi-class output)
  • LogSoftmax - Log softmax
  • Softmin - Softmin
  • Threshold - Threshold activation (configurable threshold/value)
  • Identity or None - No activation (linear)

Available Metrics:

  • MAE - Mean Absolute Error
  • MSE - Mean Squared Error
  • RMSE - Root Mean Squared Error
  • R2 - R-squared (coefficient of determination)

Step 5: Run Training

python run_train.py configs/my_project.json

Step 6: Monitor Results

  • Logs: Check logs/train_my_custom_model.log for detailed training progress
  • Models: Checkpoints saved in models/ directory
  • Console: Real-time progress updates

Advanced Configuration

Using CNN

{
  "model": {
    "algorithm": "CNN"
  },
  "cnn": {
    "input_channels": 3,
    "output_channels": [32, 64, 128],
    "kernel_sizes": [3, 3, 3],
    "strides": [1, 1, 1],
    "padding": [1, 1, 1],
    "pool_kernel": 2,
    "dropout": 0.2
  }
}

Using LSTM/RNN/GRU

{
  "model": {
    "algorithm": "LSTM"
  },
  "rnn": {
    "hidden_size": 128,
    "num_layers": 2,
    "bidirectional": true,
    "dropout": 0.2
  }
}

Using Transformer

{
  "model": {
    "algorithm": "Transformer"
  },
  "transformer": {
    "d_model": 512,
    "nhead": 8,
    "num_layers": 6,
    "dim_feedforward": 2048,
    "dropout": 0.1
  }
}

Early Stopping

{
  "early_stopping": {
    "enabled": true,
    "patience": 10,
    "metric": "loss",
    "min_delta": 0.001
  }
}

Project Structure

Training Framework/
├── train.py                 # Core training module with all architectures
├── run_train.py             # JSON configuration runner
├── genetic_algorithm.py     # Genetic algorithm extension
├── requirements.txt        # Python dependencies
├── README.md                # This file (general overview)
├── README_CONFIG.md         # Detailed JSON configuration guide
├── README_GENETIC.md        # Genetic algorithm documentation
├── setup/                   # Setup scripts and requirements
│   ├── setup.sh             # Setup script for Mac/Linux
│   ├── setup.bat            # Setup script for Windows
│   └── requirements.txt     # Dependencies (copy of main requirements.txt)
├── example_configs/         # Example configuration files
│   ├── config_example.json
│   ├── config_numpy_data.json
│   ├── config_csv_data.json
│   ├── config_lstm.json
│   └── config_genetic.json
├── configs/                # Your custom configuration files (auto-created by setup)
├── models/                  # Saved model checkpoints (auto-created by setup)
├── logs/                    # Training logs (auto-created by setup)
├── generations/             # Genetic algorithm generation data (auto-created by setup)
└── venv/                    # Virtual environment (created by setup script)

Documentation

This project includes detailed documentation in separate README files:

README_CONFIG.md

When to read: You want to understand JSON configuration in detail.

Covers:

  • Complete JSON configuration structure
  • All available configuration options
  • Data loading types (dummy, numpy, csv, custom)
  • Model naming conventions
  • Running multiple configurations
  • Minimal configuration examples

Location: README_CONFIG.md

README_GENETIC.md

When to read: You want to use the genetic algorithm extension.

Covers:

  • Genetic algorithm overview and features
  • Configuration parameters
  • Selection methods (tournament, roulette, rank-based)
  • Mutation and crossover operations
  • Fitness evaluation
  • Tips for tuning genetic algorithm parameters
  • Integration with the main framework

Location: README_GENETIC.md

train.py

When to read: You want to understand the core implementation or modify architectures.

Contains:

  • All model architectures (SimpleNet, CNN, RNN, LSTM, GRU, Transformer)
  • All optimizer implementations
  • All loss function implementations
  • Training loop implementation
  • Model saving/loading functions
  • Metrics calculation

Location: train.py (with comprehensive docstrings)

Complete Configuration Reference

This section lists all available configuration options for JSON configuration files. All options are optional and will use default values if not specified.

Model Configuration (model section)

Option Type Default Description
model_version integer 1 Model version number
name string null Model name (used in filenames). If null, uses model_v{version}
algorithm string "SimpleNet" Model architecture (see Available Algorithms above)
input_size integer 1 Input feature size (or input channels for CNN)
hidden_sizes array of integers [10] List of hidden layer sizes
output_size integer 1 Output size
activation string "ReLU" Activation function (see Available Activation Functions above)

CNN Configuration (cnn section)

Required when algorithm is "CNN" or "ConvNet"

Option Type Default Description
input_channels integer 1 Input channels (1 for grayscale, 3 for RGB)
output_channels array of integers [32, 64] List of output channels for each conv layer
kernel_sizes array of integers [3, 3] Kernel sizes for each conv layer
strides array of integers [1, 1] Strides for each conv layer
padding array of integers [1, 1] Padding for each conv layer
pool_kernel integer 2 Pooling kernel size
dropout float 0.0 Dropout rate (0.0 to 1.0)

RNN/LSTM/GRU Configuration (rnn section)

Required when algorithm is "RNN", "LSTM", or "GRU"

Option Type Default Description
hidden_size integer 64 Hidden size for RNN/LSTM/GRU
num_layers integer 1 Number of RNN layers
bidirectional boolean false Whether to use bidirectional RNN
dropout float 0.0 Dropout rate for RNN layers (0.0 to 1.0)
sequence_length integer 10 Sequence length (if not provided in data)

Transformer Configuration (transformer section)

Required when algorithm is "Transformer"

Option Type Default Description
d_model integer 512 Model dimension
nhead integer 8 Number of attention heads
num_layers integer 6 Number of transformer layers
dim_feedforward integer 2048 Feedforward dimension
dropout float 0.1 Dropout rate (0.0 to 1.0)
max_seq_len integer 100 Maximum sequence length

Training Configuration (training section)

Option Type Default Description
batch_size integer 100 Batch size for training
epochs integer 100 Number of training epochs
learning_rate float 0.01 Learning rate
seed integer 42 Random seed for reproducibility
optimizer string "Adam" Optimizer (see Available Optimizers above)
loss_function string "MSELoss" Loss function (see Available Loss Functions above)
weight_decay float 0.0 L2 regularization (weight decay)
momentum float 0.9 Momentum (for SGD optimizer)
nesterov boolean false Use Nesterov momentum (for SGD)
betas array of floats [0.9, 0.999] Beta parameters (for Adam/AdamW optimizers)
eps float 1e-8 Epsilon for optimizers
amsgrad boolean false Use AMSGrad variant (for Adam optimizer)
alpha float 0.99 Smoothing constant (for RMSprop optimizer)
centered boolean false Centered parameter (for RMSprop optimizer)
rho float 0.9 Decay rate (for Adadelta optimizer)
lr_decay float 0.0 Learning rate decay (for Adagrad optimizer)

Metrics Configuration (metrics section)

Option Type Default Description
metrics array of strings ["MAE", "R2"] List of metrics to track (see Available Metrics above)
threshold_value float 0.9 Threshold value for metrics
threshold_type string "max" "max" or "min" - whether higher/lower is better

Data Configuration (data section)

Option Type Default Description
train_test_split float 0.8 Train/test split ratio (0.0 to 1.0)
shuffle_data boolean true Whether to shuffle data before splitting
normalize_data boolean false Whether to normalize input data

Data Loading Configuration (data_loading section)

Option Type Required Description
type string Yes Data type: "dummy", "numpy", "csv", or "custom"
n_samples integer No* Number of samples (for "dummy" type)
X_path string No* Path to input features NumPy file (for "numpy" type)
y_path string No* Path to target values NumPy file (for "numpy" type)
csv_path string No* Path to CSV file (for "csv" type)
X_columns array of strings No* Column names for input features (for "csv" type)
y_column string No* Column name for target values (for "csv" type)
custom_function string No* Module path and function name, e.g., "my_module.load_data" (for "custom" type)

*Required based on type value

Early Stopping Configuration (early_stopping section)

Option Type Default Description
enabled boolean false Whether to use early stopping
patience integer 10 Number of epochs to wait before stopping
metric string "loss" Metric to check for early stopping ("loss" or any metric name)
min_delta float 0.001 Minimum change in metric to consider improvement

Logging Configuration (logging section)

Option Type Default Description
log_interval integer 10 Log every N epochs
save_interval integer 10 Save model every N epochs
save_path string "models" Path to save models
log_path string "logs" Path to save logs
log_file string null Log filename (auto-generated from model name/version if null)
log_level string "INFO" Log level: "DEBUG", "INFO", "WARNING", "ERROR"
log_format string "%(asctime)s - %(name)s - %(levelname)s - %(message)s" Log format string
log_date_format string "%Y-%m-%d %H:%M:%S" Log date format string
log_file_max_bytes integer 10485760 Maximum log file size in bytes (10MB default)
log_file_backup_count integer 10 Number of backup log files to keep

Device Configuration (device section)

Option Type Default Description
force_device string null Force device: "cpu" or "cuda". If null, auto-detects GPU if available

Model Saving/Loading Configuration (model_saving section)

Option Type Default Description
save_format string "pth" File format: "pth" or "pt" (both are PyTorch pickle format)
save_weights_only boolean false If true, save only state_dict (requires model architecture to recreate)
load_weights_only boolean false If true, use weights_only=True when loading (PyTorch 1.13.0+, safer for untrusted files)
pickle_protocol integer or null null Pickle protocol version (null = default, 2-5 supported)

Requirements

  • Python 3.7+
  • PyTorch (see requirements.txt)
  • NumPy
  • Matplotlib
  • Pandas (for CSV data loading)

Install all requirements:

pip install -r requirements.txt

Usage

Standard Training

# Single configuration
python run_train.py configs/my_config.json

# Multiple configurations
python run_train.py configs/*.json

# With glob patterns
python run_train.py configs/model_*.json

Genetic Algorithm

python genetic_algorithm.py example_configs/config_genetic.json

Direct Training (Advanced)

For advanced users who want to modify train.py directly:

import train

# Modify global variables in train.py
train.ALGORITHM = "SimpleNet"
train.INPUT_SIZE = 10
# ... etc

# Then run
train.main()

Key Concepts

Model Naming

  • Set "name" in the model configuration to use custom filenames
  • Without a name, files use model_v{version} format
  • Model checkpoints: {name}_epoch{N}.pth
  • Log files: train_{name}.log

Data Format

  • Input X: Shape (n_samples, n_features) or (n_samples, channels, height, width) for CNN
  • Output y: Shape (n_samples,) or (n_samples, n_outputs)
  • Automatically reshaped if needed

Device Selection

  • Automatically uses GPU if available
  • Force CPU: Set "force_device": "cpu" in device config
  • Force GPU: Set "force_device": "cuda" in device config

Model Saving

  • Full checkpoints include model architecture and training metadata
  • Weights-only mode available for safer loading
  • Configurable save intervals
  • Automatic final model save

Tips for Success

  1. Start Simple: Begin with config_example.json and dummy data to understand the workflow
  2. Use Examples: Copy and modify example configs rather than starting from scratch
  3. Check Logs: Always review log files for detailed training information
  4. Monitor Metrics: Use early stopping and metrics to prevent overfitting
  5. Experiment: Try different optimizers, learning rates, and architectures
  6. Organize Configs: Keep your configuration files in the configs/ directory

Troubleshooting

Common Issues

Import Errors: Make sure you've installed all requirements and activated your virtual environment.

Data Loading Errors:

  • Check file paths are correct
  • Ensure data shapes match your model configuration
  • Verify CSV column names match your configuration

Out of Memory:

  • Reduce batch size
  • Use smaller models
  • Enable gradient checkpointing (requires code modification)

Slow Training:

  • Check if GPU is being used (device in logs)
  • Reduce model size or data size
  • Use fewer epochs for testing

Next Steps

  1. Explore Examples: Run all example configurations to see different features
  2. Read Detailed Docs: Check README_CONFIG.md for JSON configuration details
  3. Try Genetic Algorithm: See README_GENETIC.md for evolutionary training
  4. Create Your Project: Start with an example config and customize for your data
  5. Experiment: Try different architectures, optimizers, and hyperparameters

Contributing

This is a training framework template. Feel free to:

  • Add new architectures
  • Extend data loading methods
  • Improve genetic algorithm features
  • Add new metrics
  • Enhance logging

License

This project is provided as-is for training and experimentation purposes.


Happy Training!

For questions or issues, refer to the detailed documentation in README_CONFIG.md and README_GENETIC.md.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published