A comprehensive, flexible PyTorch training framework that supports multiple neural network architectures, optimizers, loss functions, and data loading methods. Configure everything through JSON files without modifying code, or use the genetic algorithm extension to evolve neural network populations.
- Overview
- Features
- Quick Start
- Getting Started with Examples
- Creating Your Own Projects
- Project Structure
- Documentation
- Requirements
- Usage
This framework provides a complete solution for training PyTorch neural networks with:
- Multiple Architectures: SimpleNet/MLP, CNN, RNN, LSTM, GRU, Transformer
- Comprehensive Optimizers: Adam, SGD, AdamW, RMSprop, and many more
- Flexible Data Loading: NumPy files, CSV files, custom functions, or dummy data
- JSON Configuration: Configure everything via JSON files - no code changes needed
- Genetic Algorithm Extension: Evolve populations of neural networks
- Built-in Logging & Metrics: Track training progress with comprehensive logging
- Model Management: Save/load models with flexible checkpointing
- Support for all major PyTorch neural network architectures
- All PyTorch optimizers and loss functions available
- Comprehensive metrics tracking (MAE, R², RMSE, etc.)
- Early stopping support
- Automatic train/test splitting
- Data normalization options
- GPU/CPU support with automatic device detection
- Configure all training parameters via JSON files
- No code modifications required
- Run multiple configurations in batch
- Custom model naming for organized outputs
- Evolve populations of neural networks
- Multiple selection methods (tournament, roulette, rank-based)
- Mutation and crossover operations
- Fitness-based evolution with configurable metrics
- Generation tracking and statistics
- Dummy Data: Quick testing with generated data
- NumPy Files: Load from
.npyfiles - CSV Files: Load from CSV with column selection
- Custom Functions: Use your own data loading functions
For Mac:
cd "PyTorch Training Framework"
source setup/setup.shFor Linux:
cd "PyTorch Training Framework"
source setup/setup.shFor Windows:
cd "PyTorch Training Framework"
setup\setup.batThe setup script will:
- Create a virtual environment (
venv) - Activate the virtual environment (for the current session)
- Install all dependencies from
requirements.txt - Create necessary directories (
configs,generations,logs,models)
Note: After running the setup script, the virtual environment will be activated in that terminal session. For future sessions, you'll need to activate it manually:
source venv/bin/activate # On Windows: venv\Scripts\activate-
Clone or download this repository
-
Create a virtual environment (recommended):
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
-
Create necessary directories:
mkdir -p configs generations logs models
On Windows:
mkdir configs generations logs models
The fastest way to get started is with an example configuration:
python run_train.py example_configs/config_example.jsonThis will:
- Load a simple neural network configuration
- Generate dummy data for testing
- Train the model
- Save checkpoints to
models/ - Create logs in
logs/
The example_configs/ directory contains several example configurations to help you understand the framework:
python run_train.py example_configs/config_example.json- Complete configuration with all options
- Uses dummy data for quick testing
- Simple feedforward network (SimpleNet)
python run_train.py example_configs/config_numpy_data.json- Loads data from NumPy files
- Shows how to configure data paths
python run_train.py example_configs/config_csv_data.json- Loads data from CSV files
- Demonstrates column selection
python run_train.py example_configs/config_lstm.json- Configures an LSTM network
- Shows RNN-specific settings
python genetic_algorithm.py example_configs/config_genetic.json- Evolves a population of neural networks
- Demonstrates genetic algorithm features
Each example configuration file demonstrates different aspects:
- Model Configuration: Architecture, layer sizes, activations
- Data Loading: Different data sources and formats
- Training Parameters: Batch size, epochs, learning rate, optimizer
- Advanced Features: Early stopping, logging, metrics
Tip: Open the example JSON files in a text editor to see all available configuration options with comments and examples.
Start by copying an example configuration:
cp example_configs/config_example.json configs/my_project.jsonEdit configs/my_project.json and modify the model section:
{
"model": {
"name": "my_custom_model",
"algorithm": "SimpleNet",
"input_size": 10,
"hidden_sizes": [64, 32, 16],
"output_size": 1,
"activation": "ReLU"
}
}Available Algorithms:
SimpleNet- Simple feedforward neural network (same as MLP)MLP- Multi-Layer Perceptron (same as SimpleNet)Linear- Simple linear model (single layer)CNNorConvNet- Convolutional Neural NetworkRNN- Recurrent Neural NetworkLSTM- Long Short-Term Memory networkGRU- Gated Recurrent Unit networkTransformer- Transformer model with encoder layers
Choose your data loading method:
{
"data_loading": {
"type": "numpy",
"X_path": "data/X_train.npy",
"y_path": "data/y_train.npy"
}
}{
"data_loading": {
"type": "csv",
"csv_path": "data/dataset.csv",
"X_columns": ["feature1", "feature2", "feature3"],
"y_column": "target"
}
}{
"data_loading": {
"type": "custom",
"custom_function": "my_module.load_my_data"
}
}Your custom function should return (X, y) where:
Xis a numpy array or torch tensor of shape(n_samples, n_features)yis a numpy array or torch tensor of shape(n_samples,)or(n_samples, n_outputs)
Modify the training section:
{
"training": {
"batch_size": 64,
"epochs": 100,
"learning_rate": 0.001,
"optimizer": "Adam",
"loss_function": "MSELoss"
}
}Available Optimizers:
Adam- Adaptive Moment Estimation (default)AdamW- Adam with decoupled weight decaySGD- Stochastic Gradient DescentRMSprop- Root Mean Square PropagationAdagrad- Adaptive Gradient AlgorithmAdadelta- Adaptive Learning Rate MethodAdamax- Adam based on infinity normASGD- Averaged Stochastic Gradient DescentLBFGS- Limited-memory BFGSRprop- Resilient backpropagationRAdam- Rectified AdamNAdam- Nesterov-accelerated AdamSparseAdam- Sparse version of Adam
Available Loss Functions:
Regression Losses:
MSELoss- Mean Squared Error (default for regression)L1Loss- Mean Absolute ErrorSmoothL1Loss- Smooth L1 Loss (Huber Loss variant)HuberLoss- Huber LossPoissonNLLLoss- Poisson Negative Log LikelihoodGaussianNLLLoss- Gaussian Negative Log LikelihoodKLDivLoss- Kullback-Leibler Divergence
Classification Losses:
CrossEntropyLoss- Cross Entropy Loss (default for classification)BCELoss- Binary Cross EntropyBCEWithLogitsLoss- BCE with logits (more numerically stable)NLLLoss- Negative Log LikelihoodMultiLabelMarginLoss- Multi-label margin lossMultiLabelSoftMarginLoss- Multi-label soft margin lossMultiMarginLoss- Multi-class margin lossSoftMarginLoss- Soft margin lossMarginRankingLoss- Margin ranking lossTripletMarginLoss- Triplet margin lossTripletMarginWithDistanceLoss- Triplet margin with custom distanceHingeEmbeddingLoss- Hinge embedding lossCTCLoss- Connectionist Temporal Classification loss
Other Losses:
CosineEmbeddingLoss- Cosine embedding lossLabelSmoothingCrossEntropy- Cross entropy with label smoothing
Available Activation Functions:
ReLU- Rectified Linear Unit (default)ReLU6- ReLU with max value of 6LeakyReLU- Leaky ReLU (configurable negative slope)PReLU- Parametric ReLURReLU- Randomized ReLUGELU- Gaussian Error Linear UnitSigmoid- Sigmoid activationTanh- Hyperbolic tangentHardtanh- Hard tanh (configurable min/max)Hardswish- Hard swish activationELU- Exponential Linear UnitCELU- Continuously Differentiable ELUSELU- Scaled Exponential Linear UnitGLU- Gated Linear UnitSiLU- Sigmoid Linear UnitMish- Mish activationSoftplus- Softplus activationSoftshrink- Soft shrinkageHardshrink- Hard shrinkageSoftsign- Soft sign activationTanhshrink- Tanh shrinkageHardsigmoid- Hard sigmoidLogSigmoid- Log sigmoidSoftmax- Softmax (for multi-class output)LogSoftmax- Log softmaxSoftmin- SoftminThreshold- Threshold activation (configurable threshold/value)IdentityorNone- No activation (linear)
Available Metrics:
MAE- Mean Absolute ErrorMSE- Mean Squared ErrorRMSE- Root Mean Squared ErrorR2- R-squared (coefficient of determination)
python run_train.py configs/my_project.json- Logs: Check
logs/train_my_custom_model.logfor detailed training progress - Models: Checkpoints saved in
models/directory - Console: Real-time progress updates
{
"model": {
"algorithm": "CNN"
},
"cnn": {
"input_channels": 3,
"output_channels": [32, 64, 128],
"kernel_sizes": [3, 3, 3],
"strides": [1, 1, 1],
"padding": [1, 1, 1],
"pool_kernel": 2,
"dropout": 0.2
}
}{
"model": {
"algorithm": "LSTM"
},
"rnn": {
"hidden_size": 128,
"num_layers": 2,
"bidirectional": true,
"dropout": 0.2
}
}{
"model": {
"algorithm": "Transformer"
},
"transformer": {
"d_model": 512,
"nhead": 8,
"num_layers": 6,
"dim_feedforward": 2048,
"dropout": 0.1
}
}{
"early_stopping": {
"enabled": true,
"patience": 10,
"metric": "loss",
"min_delta": 0.001
}
}Training Framework/
├── train.py # Core training module with all architectures
├── run_train.py # JSON configuration runner
├── genetic_algorithm.py # Genetic algorithm extension
├── requirements.txt # Python dependencies
├── README.md # This file (general overview)
├── README_CONFIG.md # Detailed JSON configuration guide
├── README_GENETIC.md # Genetic algorithm documentation
├── setup/ # Setup scripts and requirements
│ ├── setup.sh # Setup script for Mac/Linux
│ ├── setup.bat # Setup script for Windows
│ └── requirements.txt # Dependencies (copy of main requirements.txt)
├── example_configs/ # Example configuration files
│ ├── config_example.json
│ ├── config_numpy_data.json
│ ├── config_csv_data.json
│ ├── config_lstm.json
│ └── config_genetic.json
├── configs/ # Your custom configuration files (auto-created by setup)
├── models/ # Saved model checkpoints (auto-created by setup)
├── logs/ # Training logs (auto-created by setup)
├── generations/ # Genetic algorithm generation data (auto-created by setup)
└── venv/ # Virtual environment (created by setup script)
This project includes detailed documentation in separate README files:
When to read: You want to understand JSON configuration in detail.
Covers:
- Complete JSON configuration structure
- All available configuration options
- Data loading types (dummy, numpy, csv, custom)
- Model naming conventions
- Running multiple configurations
- Minimal configuration examples
Location: README_CONFIG.md
When to read: You want to use the genetic algorithm extension.
Covers:
- Genetic algorithm overview and features
- Configuration parameters
- Selection methods (tournament, roulette, rank-based)
- Mutation and crossover operations
- Fitness evaluation
- Tips for tuning genetic algorithm parameters
- Integration with the main framework
Location: README_GENETIC.md
When to read: You want to understand the core implementation or modify architectures.
Contains:
- All model architectures (SimpleNet, CNN, RNN, LSTM, GRU, Transformer)
- All optimizer implementations
- All loss function implementations
- Training loop implementation
- Model saving/loading functions
- Metrics calculation
Location: train.py (with comprehensive docstrings)
This section lists all available configuration options for JSON configuration files. All options are optional and will use default values if not specified.
| Option | Type | Default | Description |
|---|---|---|---|
model_version |
integer | 1 |
Model version number |
name |
string | null |
Model name (used in filenames). If null, uses model_v{version} |
algorithm |
string | "SimpleNet" |
Model architecture (see Available Algorithms above) |
input_size |
integer | 1 |
Input feature size (or input channels for CNN) |
hidden_sizes |
array of integers | [10] |
List of hidden layer sizes |
output_size |
integer | 1 |
Output size |
activation |
string | "ReLU" |
Activation function (see Available Activation Functions above) |
Required when algorithm is "CNN" or "ConvNet"
| Option | Type | Default | Description |
|---|---|---|---|
input_channels |
integer | 1 |
Input channels (1 for grayscale, 3 for RGB) |
output_channels |
array of integers | [32, 64] |
List of output channels for each conv layer |
kernel_sizes |
array of integers | [3, 3] |
Kernel sizes for each conv layer |
strides |
array of integers | [1, 1] |
Strides for each conv layer |
padding |
array of integers | [1, 1] |
Padding for each conv layer |
pool_kernel |
integer | 2 |
Pooling kernel size |
dropout |
float | 0.0 |
Dropout rate (0.0 to 1.0) |
Required when algorithm is "RNN", "LSTM", or "GRU"
| Option | Type | Default | Description |
|---|---|---|---|
hidden_size |
integer | 64 |
Hidden size for RNN/LSTM/GRU |
num_layers |
integer | 1 |
Number of RNN layers |
bidirectional |
boolean | false |
Whether to use bidirectional RNN |
dropout |
float | 0.0 |
Dropout rate for RNN layers (0.0 to 1.0) |
sequence_length |
integer | 10 |
Sequence length (if not provided in data) |
Required when algorithm is "Transformer"
| Option | Type | Default | Description |
|---|---|---|---|
d_model |
integer | 512 |
Model dimension |
nhead |
integer | 8 |
Number of attention heads |
num_layers |
integer | 6 |
Number of transformer layers |
dim_feedforward |
integer | 2048 |
Feedforward dimension |
dropout |
float | 0.1 |
Dropout rate (0.0 to 1.0) |
max_seq_len |
integer | 100 |
Maximum sequence length |
| Option | Type | Default | Description |
|---|---|---|---|
batch_size |
integer | 100 |
Batch size for training |
epochs |
integer | 100 |
Number of training epochs |
learning_rate |
float | 0.01 |
Learning rate |
seed |
integer | 42 |
Random seed for reproducibility |
optimizer |
string | "Adam" |
Optimizer (see Available Optimizers above) |
loss_function |
string | "MSELoss" |
Loss function (see Available Loss Functions above) |
weight_decay |
float | 0.0 |
L2 regularization (weight decay) |
momentum |
float | 0.9 |
Momentum (for SGD optimizer) |
nesterov |
boolean | false |
Use Nesterov momentum (for SGD) |
betas |
array of floats | [0.9, 0.999] |
Beta parameters (for Adam/AdamW optimizers) |
eps |
float | 1e-8 |
Epsilon for optimizers |
amsgrad |
boolean | false |
Use AMSGrad variant (for Adam optimizer) |
alpha |
float | 0.99 |
Smoothing constant (for RMSprop optimizer) |
centered |
boolean | false |
Centered parameter (for RMSprop optimizer) |
rho |
float | 0.9 |
Decay rate (for Adadelta optimizer) |
lr_decay |
float | 0.0 |
Learning rate decay (for Adagrad optimizer) |
| Option | Type | Default | Description |
|---|---|---|---|
metrics |
array of strings | ["MAE", "R2"] |
List of metrics to track (see Available Metrics above) |
threshold_value |
float | 0.9 |
Threshold value for metrics |
threshold_type |
string | "max" |
"max" or "min" - whether higher/lower is better |
| Option | Type | Default | Description |
|---|---|---|---|
train_test_split |
float | 0.8 |
Train/test split ratio (0.0 to 1.0) |
shuffle_data |
boolean | true |
Whether to shuffle data before splitting |
normalize_data |
boolean | false |
Whether to normalize input data |
| Option | Type | Required | Description |
|---|---|---|---|
type |
string | Yes | Data type: "dummy", "numpy", "csv", or "custom" |
n_samples |
integer | No* | Number of samples (for "dummy" type) |
X_path |
string | No* | Path to input features NumPy file (for "numpy" type) |
y_path |
string | No* | Path to target values NumPy file (for "numpy" type) |
csv_path |
string | No* | Path to CSV file (for "csv" type) |
X_columns |
array of strings | No* | Column names for input features (for "csv" type) |
y_column |
string | No* | Column name for target values (for "csv" type) |
custom_function |
string | No* | Module path and function name, e.g., "my_module.load_data" (for "custom" type) |
*Required based on type value
| Option | Type | Default | Description |
|---|---|---|---|
enabled |
boolean | false |
Whether to use early stopping |
patience |
integer | 10 |
Number of epochs to wait before stopping |
metric |
string | "loss" |
Metric to check for early stopping ("loss" or any metric name) |
min_delta |
float | 0.001 |
Minimum change in metric to consider improvement |
| Option | Type | Default | Description |
|---|---|---|---|
log_interval |
integer | 10 |
Log every N epochs |
save_interval |
integer | 10 |
Save model every N epochs |
save_path |
string | "models" |
Path to save models |
log_path |
string | "logs" |
Path to save logs |
log_file |
string | null |
Log filename (auto-generated from model name/version if null) |
log_level |
string | "INFO" |
Log level: "DEBUG", "INFO", "WARNING", "ERROR" |
log_format |
string | "%(asctime)s - %(name)s - %(levelname)s - %(message)s" |
Log format string |
log_date_format |
string | "%Y-%m-%d %H:%M:%S" |
Log date format string |
log_file_max_bytes |
integer | 10485760 |
Maximum log file size in bytes (10MB default) |
log_file_backup_count |
integer | 10 |
Number of backup log files to keep |
| Option | Type | Default | Description |
|---|---|---|---|
force_device |
string | null |
Force device: "cpu" or "cuda". If null, auto-detects GPU if available |
| Option | Type | Default | Description |
|---|---|---|---|
save_format |
string | "pth" |
File format: "pth" or "pt" (both are PyTorch pickle format) |
save_weights_only |
boolean | false |
If true, save only state_dict (requires model architecture to recreate) |
load_weights_only |
boolean | false |
If true, use weights_only=True when loading (PyTorch 1.13.0+, safer for untrusted files) |
pickle_protocol |
integer or null |
null |
Pickle protocol version (null = default, 2-5 supported) |
- Python 3.7+
- PyTorch (see
requirements.txt) - NumPy
- Matplotlib
- Pandas (for CSV data loading)
Install all requirements:
pip install -r requirements.txt# Single configuration
python run_train.py configs/my_config.json
# Multiple configurations
python run_train.py configs/*.json
# With glob patterns
python run_train.py configs/model_*.jsonpython genetic_algorithm.py example_configs/config_genetic.jsonFor advanced users who want to modify train.py directly:
import train
# Modify global variables in train.py
train.ALGORITHM = "SimpleNet"
train.INPUT_SIZE = 10
# ... etc
# Then run
train.main()- Set
"name"in the model configuration to use custom filenames - Without a name, files use
model_v{version}format - Model checkpoints:
{name}_epoch{N}.pth - Log files:
train_{name}.log
- Input
X: Shape(n_samples, n_features)or(n_samples, channels, height, width)for CNN - Output
y: Shape(n_samples,)or(n_samples, n_outputs) - Automatically reshaped if needed
- Automatically uses GPU if available
- Force CPU: Set
"force_device": "cpu"in device config - Force GPU: Set
"force_device": "cuda"in device config
- Full checkpoints include model architecture and training metadata
- Weights-only mode available for safer loading
- Configurable save intervals
- Automatic final model save
- Start Simple: Begin with
config_example.jsonand dummy data to understand the workflow - Use Examples: Copy and modify example configs rather than starting from scratch
- Check Logs: Always review log files for detailed training information
- Monitor Metrics: Use early stopping and metrics to prevent overfitting
- Experiment: Try different optimizers, learning rates, and architectures
- Organize Configs: Keep your configuration files in the
configs/directory
Import Errors: Make sure you've installed all requirements and activated your virtual environment.
Data Loading Errors:
- Check file paths are correct
- Ensure data shapes match your model configuration
- Verify CSV column names match your configuration
Out of Memory:
- Reduce batch size
- Use smaller models
- Enable gradient checkpointing (requires code modification)
Slow Training:
- Check if GPU is being used (
devicein logs) - Reduce model size or data size
- Use fewer epochs for testing
- Explore Examples: Run all example configurations to see different features
- Read Detailed Docs: Check
README_CONFIG.mdfor JSON configuration details - Try Genetic Algorithm: See
README_GENETIC.mdfor evolutionary training - Create Your Project: Start with an example config and customize for your data
- Experiment: Try different architectures, optimizers, and hyperparameters
This is a training framework template. Feel free to:
- Add new architectures
- Extend data loading methods
- Improve genetic algorithm features
- Add new metrics
- Enhance logging
This project is provided as-is for training and experimentation purposes.
Happy Training!
For questions or issues, refer to the detailed documentation in README_CONFIG.md and README_GENETIC.md.