PyTorch Training Framework

A comprehensive, flexible PyTorch training framework that supports multiple neural network architectures, optimizers, loss functions, and data loading methods. Configure everything through JSON files without modifying code, or use the genetic algorithm extension to evolve neural network populations.

Overview

This framework provides a complete solution for training PyTorch neural networks with:

Multiple Architectures: SimpleNet/MLP, CNN, RNN, LSTM, GRU, Transformer
Comprehensive Optimizers: Adam, SGD, AdamW, RMSprop, and many more
Flexible Data Loading: NumPy files, CSV files, custom functions, or dummy data
JSON Configuration: Configure everything via JSON files - no code changes needed
Genetic Algorithm Extension: Evolve populations of neural networks
Built-in Logging & Metrics: Track training progress with comprehensive logging
Model Management: Save/load models with flexible checkpointing

Features

Core Training Framework

Support for all major PyTorch neural network architectures
All PyTorch optimizers and loss functions available
Comprehensive metrics tracking (MAE, R², RMSE, etc.)
Early stopping support
Automatic train/test splitting
Data normalization options
GPU/CPU support with automatic device detection

JSON Configuration System

Configure all training parameters via JSON files
No code modifications required
Run multiple configurations in batch
Custom model naming for organized outputs

Genetic Algorithm Extension

Evolve populations of neural networks
Multiple selection methods (tournament, roulette, rank-based)
Mutation and crossover operations
Fitness-based evolution with configurable metrics
Generation tracking and statistics

Data Loading

Dummy Data: Quick testing with generated data
NumPy Files: Load from .npy files
CSV Files: Load from CSV with column selection
Custom Functions: Use your own data loading functions

Quick Start

Installation

Option 1: Automated Setup (Recommended)

For Mac:

cd "PyTorch Training Framework"
source setup/setup.sh

For Linux:

cd "PyTorch Training Framework"
 source setup/setup.sh

For Windows:

cd "PyTorch Training Framework"
setup\setup.bat

The setup script will:

Create a virtual environment (venv)
Activate the virtual environment (for the current session)
Install all dependencies from requirements.txt
Create necessary directories (configs, generations, logs, models)

Note: After running the setup script, the virtual environment will be activated in that terminal session. For future sessions, you'll need to activate it manually:

source venv/bin/activate  # On Windows: venv\Scripts\activate

Option 2: Manual Setup

Clone or download this repository

Create a virtual environment (recommended):

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:
```
pip install -r requirements.txt
```

Create necessary directories:

mkdir -p configs generations logs models

On Windows:

mkdir configs generations logs models

Run Your First Training

The fastest way to get started is with an example configuration:

python run_train.py example_configs/config_example.json

This will:

Load a simple neural network configuration
Generate dummy data for testing
Train the model
Save checkpoints to models/
Create logs in logs/

Getting Started with Examples

The example_configs/ directory contains several example configurations to help you understand the framework:

1. Basic Example

python run_train.py example_configs/config_example.json

Complete configuration with all options
Uses dummy data for quick testing
Simple feedforward network (SimpleNet)

2. NumPy Data Example

python run_train.py example_configs/config_numpy_data.json

Loads data from NumPy files
Shows how to configure data paths

3. CSV Data Example

python run_train.py example_configs/config_csv_data.json

Loads data from CSV files
Demonstrates column selection

4. LSTM Example

python run_train.py example_configs/config_lstm.json

Configures an LSTM network
Shows RNN-specific settings

5. Genetic Algorithm Example

python genetic_algorithm.py example_configs/config_genetic.json

Evolves a population of neural networks
Demonstrates genetic algorithm features

Understanding the Examples

Each example configuration file demonstrates different aspects:

Model Configuration: Architecture, layer sizes, activations
Data Loading: Different data sources and formats
Training Parameters: Batch size, epochs, learning rate, optimizer
Advanced Features: Early stopping, logging, metrics

Tip: Open the example JSON files in a text editor to see all available configuration options with comments and examples.

Creating Your Own Projects

Step 1: Create Your Configuration File

Start by copying an example configuration:

cp example_configs/config_example.json configs/my_project.json

Step 2: Configure Your Model

Edit configs/my_project.json and modify the model section:

{
  "model": {
    "name": "my_custom_model",
    "algorithm": "SimpleNet",
    "input_size": 10,
    "hidden_sizes": [64, 32, 16],
    "output_size": 1,
    "activation": "ReLU"
  }
}

Available Algorithms:

SimpleNet - Simple feedforward neural network (same as MLP)
MLP - Multi-Layer Perceptron (same as SimpleNet)
Linear - Simple linear model (single layer)
CNN or ConvNet - Convolutional Neural Network
RNN - Recurrent Neural Network
LSTM - Long Short-Term Memory network
GRU - Gated Recurrent Unit network
Transformer - Transformer model with encoder layers

Step 3: Configure Your Data

Choose your data loading method:

Option A: NumPy Files

{
  "data_loading": {
    "type": "numpy",
    "X_path": "data/X_train.npy",
    "y_path": "data/y_train.npy"
  }
}

Option B: CSV File

{
  "data_loading": {
    "type": "csv",
    "csv_path": "data/dataset.csv",
    "X_columns": ["feature1", "feature2", "feature3"],
    "y_column": "target"
  }
}

Option C: Custom Function

{
  "data_loading": {
    "type": "custom",
    "custom_function": "my_module.load_my_data"
  }
}

Your custom function should return (X, y) where:

X is a numpy array or torch tensor of shape (n_samples, n_features)
y is a numpy array or torch tensor of shape (n_samples,) or (n_samples, n_outputs)

Step 4: Configure Training Parameters

Modify the training section:

{
  "training": {
    "batch_size": 64,
    "epochs": 100,
    "learning_rate": 0.001,
    "optimizer": "Adam",
    "loss_function": "MSELoss"
  }
}

Available Optimizers:

Adam - Adaptive Moment Estimation (default)
AdamW - Adam with decoupled weight decay
SGD - Stochastic Gradient Descent
RMSprop - Root Mean Square Propagation
Adagrad - Adaptive Gradient Algorithm
Adadelta - Adaptive Learning Rate Method
Adamax - Adam based on infinity norm
ASGD - Averaged Stochastic Gradient Descent
LBFGS - Limited-memory BFGS
Rprop - Resilient backpropagation
RAdam - Rectified Adam
NAdam - Nesterov-accelerated Adam
SparseAdam - Sparse version of Adam

Available Loss Functions:

Regression Losses:

MSELoss - Mean Squared Error (default for regression)
L1Loss - Mean Absolute Error
SmoothL1Loss - Smooth L1 Loss (Huber Loss variant)
HuberLoss - Huber Loss
PoissonNLLLoss - Poisson Negative Log Likelihood
GaussianNLLLoss - Gaussian Negative Log Likelihood
KLDivLoss - Kullback-Leibler Divergence

Classification Losses:

CrossEntropyLoss - Cross Entropy Loss (default for classification)
BCELoss - Binary Cross Entropy
BCEWithLogitsLoss - BCE with logits (more numerically stable)
NLLLoss - Negative Log Likelihood
MultiLabelMarginLoss - Multi-label margin loss
MultiLabelSoftMarginLoss - Multi-label soft margin loss
MultiMarginLoss - Multi-class margin loss
SoftMarginLoss - Soft margin loss
MarginRankingLoss - Margin ranking loss
TripletMarginLoss - Triplet margin loss
TripletMarginWithDistanceLoss - Triplet margin with custom distance
HingeEmbeddingLoss - Hinge embedding loss
CTCLoss - Connectionist Temporal Classification loss

Other Losses:

CosineEmbeddingLoss - Cosine embedding loss
LabelSmoothingCrossEntropy - Cross entropy with label smoothing

Available Activation Functions:

ReLU - Rectified Linear Unit (default)
ReLU6 - ReLU with max value of 6
LeakyReLU - Leaky ReLU (configurable negative slope)
PReLU - Parametric ReLU
RReLU - Randomized ReLU
GELU - Gaussian Error Linear Unit
Sigmoid - Sigmoid activation
Tanh - Hyperbolic tangent
Hardtanh - Hard tanh (configurable min/max)
Hardswish - Hard swish activation
ELU - Exponential Linear Unit
CELU - Continuously Differentiable ELU
SELU - Scaled Exponential Linear Unit
GLU - Gated Linear Unit
SiLU - Sigmoid Linear Unit
Mish - Mish activation
Softplus - Softplus activation
Softshrink - Soft shrinkage
Hardshrink - Hard shrinkage
Softsign - Soft sign activation
Tanhshrink - Tanh shrinkage
Hardsigmoid - Hard sigmoid
LogSigmoid - Log sigmoid
Softmax - Softmax (for multi-class output)
LogSoftmax - Log softmax
Softmin - Softmin
Threshold - Threshold activation (configurable threshold/value)
Identity or None - No activation (linear)

Available Metrics:

MAE - Mean Absolute Error
MSE - Mean Squared Error
RMSE - Root Mean Squared Error
R2 - R-squared (coefficient of determination)

Step 5: Run Training

python run_train.py configs/my_project.json

Step 6: Monitor Results

Logs: Check logs/train_my_custom_model.log for detailed training progress
Models: Checkpoints saved in models/ directory
Console: Real-time progress updates

Advanced Configuration

Using CNN

{
  "model": {
    "algorithm": "CNN"
  },
  "cnn": {
    "input_channels": 3,
    "output_channels": [32, 64, 128],
    "kernel_sizes": [3, 3, 3],
    "strides": [1, 1, 1],
    "padding": [1, 1, 1],
    "pool_kernel": 2,
    "dropout": 0.2
  }
}

Using LSTM/RNN/GRU

{
  "model": {
    "algorithm": "LSTM"
  },
  "rnn": {
    "hidden_size": 128,
    "num_layers": 2,
    "bidirectional": true,
    "dropout": 0.2
  }
}

Using Transformer

{
  "model": {
    "algorithm": "Transformer"
  },
  "transformer": {
    "d_model": 512,
    "nhead": 8,
    "num_layers": 6,
    "dim_feedforward": 2048,
    "dropout": 0.1
  }
}

Early Stopping

{
  "early_stopping": {
    "enabled": true,
    "patience": 10,
    "metric": "loss",
    "min_delta": 0.001
  }
}

Project Structure

Training Framework/
├── train.py                 # Core training module with all architectures
├── run_train.py             # JSON configuration runner
├── genetic_algorithm.py     # Genetic algorithm extension
├── requirements.txt        # Python dependencies
├── README.md                # This file (general overview)
├── README_CONFIG.md         # Detailed JSON configuration guide
├── README_GENETIC.md        # Genetic algorithm documentation
├── setup/                   # Setup scripts and requirements
│   ├── setup.sh             # Setup script for Mac/Linux
│   ├── setup.bat            # Setup script for Windows
│   └── requirements.txt     # Dependencies (copy of main requirements.txt)
├── example_configs/         # Example configuration files
│   ├── config_example.json
│   ├── config_numpy_data.json
│   ├── config_csv_data.json
│   ├── config_lstm.json
│   └── config_genetic.json
├── configs/                # Your custom configuration files (auto-created by setup)
├── models/                  # Saved model checkpoints (auto-created by setup)
├── logs/                    # Training logs (auto-created by setup)
├── generations/             # Genetic algorithm generation data (auto-created by setup)
└── venv/                    # Virtual environment (created by setup script)

Documentation

This project includes detailed documentation in separate README files:

README_CONFIG.md

When to read: You want to understand JSON configuration in detail.

Covers:

Complete JSON configuration structure
All available configuration options
Data loading types (dummy, numpy, csv, custom)
Model naming conventions
Running multiple configurations
Minimal configuration examples

Location: README_CONFIG.md

README_GENETIC.md

When to read: You want to use the genetic algorithm extension.

Covers:

Genetic algorithm overview and features
Configuration parameters
Selection methods (tournament, roulette, rank-based)
Mutation and crossover operations
Fitness evaluation
Tips for tuning genetic algorithm parameters
Integration with the main framework

Location: README_GENETIC.md

train.py

When to read: You want to understand the core implementation or modify architectures.

Contains:

All model architectures (SimpleNet, CNN, RNN, LSTM, GRU, Transformer)
All optimizer implementations
All loss function implementations
Training loop implementation
Model saving/loading functions
Metrics calculation

Location: train.py (with comprehensive docstrings)

Complete Configuration Reference

This section lists all available configuration options for JSON configuration files. All options are optional and will use default values if not specified.

Model Configuration (`model` section)

Option	Type	Default	Description
`model_version`	integer	`1`	Model version number
`name`	string	`null`	Model name (used in filenames). If `null`, uses `model_v{version}`
`algorithm`	string	`"SimpleNet"`	Model architecture (see Available Algorithms above)
`input_size`	integer	`1`	Input feature size (or input channels for CNN)
`hidden_sizes`	array of integers	`[10]`	List of hidden layer sizes
`output_size`	integer	`1`	Output size
`activation`	string	`"ReLU"`	Activation function (see Available Activation Functions above)

CNN Configuration (`cnn` section)

Required when algorithm is "CNN" or "ConvNet"

Option	Type	Default	Description
`input_channels`	integer	`1`	Input channels (1 for grayscale, 3 for RGB)
`output_channels`	array of integers	`[32, 64]`	List of output channels for each conv layer
`kernel_sizes`	array of integers	`[3, 3]`	Kernel sizes for each conv layer
`strides`	array of integers	`[1, 1]`	Strides for each conv layer
`padding`	array of integers	`[1, 1]`	Padding for each conv layer
`pool_kernel`	integer	`2`	Pooling kernel size
`dropout`	float	`0.0`	Dropout rate (0.0 to 1.0)

RNN/LSTM/GRU Configuration (`rnn` section)

Required when algorithm is "RNN", "LSTM", or "GRU"

Option	Type	Default	Description
`hidden_size`	integer	`64`	Hidden size for RNN/LSTM/GRU
`num_layers`	integer	`1`	Number of RNN layers
`bidirectional`	boolean	`false`	Whether to use bidirectional RNN
`dropout`	float	`0.0`	Dropout rate for RNN layers (0.0 to 1.0)
`sequence_length`	integer	`10`	Sequence length (if not provided in data)

Transformer Configuration (`transformer` section)

Required when algorithm is "Transformer"

Option	Type	Default	Description
`d_model`	integer	`512`	Model dimension
`nhead`	integer	`8`	Number of attention heads
`num_layers`	integer	`6`	Number of transformer layers
`dim_feedforward`	integer	`2048`	Feedforward dimension
`dropout`	float	`0.1`	Dropout rate (0.0 to 1.0)
`max_seq_len`	integer	`100`	Maximum sequence length

Training Configuration (`training` section)

Option	Type	Default	Description
`batch_size`	integer	`100`	Batch size for training
`epochs`	integer	`100`	Number of training epochs
`learning_rate`	float	`0.01`	Learning rate
`seed`	integer	`42`	Random seed for reproducibility
`optimizer`	string	`"Adam"`	Optimizer (see Available Optimizers above)
`loss_function`	string	`"MSELoss"`	Loss function (see Available Loss Functions above)
`weight_decay`	float	`0.0`	L2 regularization (weight decay)
`momentum`	float	`0.9`	Momentum (for SGD optimizer)
`nesterov`	boolean	`false`	Use Nesterov momentum (for SGD)
`betas`	array of floats	`[0.9, 0.999]`	Beta parameters (for Adam/AdamW optimizers)
`eps`	float	`1e-8`	Epsilon for optimizers
`amsgrad`	boolean	`false`	Use AMSGrad variant (for Adam optimizer)
`alpha`	float	`0.99`	Smoothing constant (for RMSprop optimizer)
`centered`	boolean	`false`	Centered parameter (for RMSprop optimizer)
`rho`	float	`0.9`	Decay rate (for Adadelta optimizer)
`lr_decay`	float	`0.0`	Learning rate decay (for Adagrad optimizer)

Metrics Configuration (`metrics` section)

Option	Type	Default	Description
`metrics`	array of strings	`["MAE", "R2"]`	List of metrics to track (see Available Metrics above)
`threshold_value`	float	`0.9`	Threshold value for metrics
`threshold_type`	string	`"max"`	`"max"` or `"min"` - whether higher/lower is better

Data Configuration (`data` section)

Option	Type	Default	Description
`train_test_split`	float	`0.8`	Train/test split ratio (0.0 to 1.0)
`shuffle_data`	boolean	`true`	Whether to shuffle data before splitting
`normalize_data`	boolean	`false`	Whether to normalize input data

Data Loading Configuration (`data_loading` section)

Option	Type	Required	Description
`type`	string	Yes	Data type: `"dummy"`, `"numpy"`, `"csv"`, or `"custom"`
`n_samples`	integer	No*	Number of samples (for `"dummy"` type)
`X_path`	string	No*	Path to input features NumPy file (for `"numpy"` type)
`y_path`	string	No*	Path to target values NumPy file (for `"numpy"` type)
`csv_path`	string	No*	Path to CSV file (for `"csv"` type)
`X_columns`	array of strings	No*	Column names for input features (for `"csv"` type)
`y_column`	string	No*	Column name for target values (for `"csv"` type)
`custom_function`	string	No*	Module path and function name, e.g., `"my_module.load_data"` (for `"custom"` type)

*Required based on type value

Early Stopping Configuration (`early_stopping` section)

Option	Type	Default	Description
`enabled`	boolean	`false`	Whether to use early stopping
`patience`	integer	`10`	Number of epochs to wait before stopping
`metric`	string	`"loss"`	Metric to check for early stopping (`"loss"` or any metric name)
`min_delta`	float	`0.001`	Minimum change in metric to consider improvement

Logging Configuration (`logging` section)

Option	Type	Default	Description
`log_interval`	integer	`10`	Log every N epochs
`save_interval`	integer	`10`	Save model every N epochs
`save_path`	string	`"models"`	Path to save models
`log_path`	string	`"logs"`	Path to save logs
`log_file`	string	`null`	Log filename (auto-generated from model name/version if `null`)
`log_level`	string	`"INFO"`	Log level: `"DEBUG"`, `"INFO"`, `"WARNING"`, `"ERROR"`
`log_format`	string	`"%(asctime)s - %(name)s - %(levelname)s - %(message)s"`	Log format string
`log_date_format`	string	`"%Y-%m-%d %H:%M:%S"`	Log date format string
`log_file_max_bytes`	integer	`10485760`	Maximum log file size in bytes (10MB default)
`log_file_backup_count`	integer	`10`	Number of backup log files to keep

Device Configuration (`device` section)

Option	Type	Default	Description
`force_device`	string	`null`	Force device: `"cpu"` or `"cuda"`. If `null`, auto-detects GPU if available

Model Saving/Loading Configuration (`model_saving` section)

Option	Type	Default	Description
`save_format`	string	`"pth"`	File format: `"pth"` or `"pt"` (both are PyTorch pickle format)
`save_weights_only`	boolean	`false`	If `true`, save only state_dict (requires model architecture to recreate)
`load_weights_only`	boolean	`false`	If `true`, use weights_only=True when loading (PyTorch 1.13.0+, safer for untrusted files)
`pickle_protocol`	integer or `null`	`null`	Pickle protocol version (`null` = default, 2-5 supported)

Requirements

Python 3.7+
PyTorch (see requirements.txt)
NumPy
Matplotlib
Pandas (for CSV data loading)

Install all requirements:

pip install -r requirements.txt

Usage

Standard Training

# Single configuration
python run_train.py configs/my_config.json

# Multiple configurations
python run_train.py configs/*.json

# With glob patterns
python run_train.py configs/model_*.json

Genetic Algorithm

python genetic_algorithm.py example_configs/config_genetic.json

Direct Training (Advanced)

For advanced users who want to modify train.py directly:

import train

# Modify global variables in train.py
train.ALGORITHM = "SimpleNet"
train.INPUT_SIZE = 10
# ... etc

# Then run
train.main()

Key Concepts

Model Naming

Set "name" in the model configuration to use custom filenames
Without a name, files use model_v{version} format
Model checkpoints: {name}_epoch{N}.pth
Log files: train_{name}.log

Data Format

Input X: Shape (n_samples, n_features) or (n_samples, channels, height, width) for CNN
Output y: Shape (n_samples,) or (n_samples, n_outputs)
Automatically reshaped if needed

Device Selection

Automatically uses GPU if available
Force CPU: Set "force_device": "cpu" in device config
Force GPU: Set "force_device": "cuda" in device config

Model Saving

Full checkpoints include model architecture and training metadata
Weights-only mode available for safer loading
Configurable save intervals
Automatic final model save

Tips for Success

Start Simple: Begin with config_example.json and dummy data to understand the workflow
Use Examples: Copy and modify example configs rather than starting from scratch
Check Logs: Always review log files for detailed training information
Monitor Metrics: Use early stopping and metrics to prevent overfitting
Experiment: Try different optimizers, learning rates, and architectures
Organize Configs: Keep your configuration files in the configs/ directory

Troubleshooting

Common Issues

Import Errors: Make sure you've installed all requirements and activated your virtual environment.

Data Loading Errors:

Check file paths are correct
Ensure data shapes match your model configuration
Verify CSV column names match your configuration

Out of Memory:

Reduce batch size
Use smaller models
Enable gradient checkpointing (requires code modification)

Slow Training:

Check if GPU is being used (device in logs)
Reduce model size or data size
Use fewer epochs for testing

Next Steps

Explore Examples: Run all example configurations to see different features
Read Detailed Docs: Check README_CONFIG.md for JSON configuration details
Try Genetic Algorithm: See README_GENETIC.md for evolutionary training
Create Your Project: Start with an example config and customize for your data
Experiment: Try different architectures, optimizers, and hyperparameters

Contributing

This is a training framework template. Feel free to:

Add new architectures
Extend data loading methods
Improve genetic algorithm features
Add new metrics
Enhance logging

License

This project is provided as-is for training and experimentation purposes.

Happy Training!

For questions or issues, refer to the detailed documentation in README_CONFIG.md and README_GENETIC.md.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.old		.old
example_configs		example_configs
setup		setup
.gitignore		.gitignore
README.md		README.md
README_CONFIG.md		README_CONFIG.md
README_GENETIC.md		README_GENETIC.md
clear.py		clear.py
genetic_algorithm.py		genetic_algorithm.py
run_train.py		run_train.py
train.py		train.py

WesleyJunkins/PyTorch-Training-Framework

Folders and files

Latest commit

History

Repository files navigation