A lightweight template for PyTorch based deep learning projects with main features of configuration management (Hydra), logging (Loguru+TensorBoard), and hardware-agnostic training (Lightning Fiber). Designed for rapid experimentation while enforcing best practices.
- Why This Template?
- How to use
- Configuration Management
- Logging and Visualization
- Toy Example
- Extending the Template
- License
Note: This section provides detailed background and motivation. If you're just looking to get started quickly, you can skip to the Quick Start section. If you're interested in why this template exists and what problems it solves, expand below.
🤔Click to expand the motivation and background🤔
There are plenty of deep learning templates out there—so why this one?
As a research engineer in computer vision for over four years, my workflow has consistently involved:
- Reviewing SOTA papers
- Implementing papers or adapting their existing codebases to specific datasets and problems
However, existing implementations often come with excessive complexity. Each codebase has a different structure, making it time-consuming to adapt. In reality, I only need the essentials:
- Dataset processing
- Model architecture
- Loss functions
- Training logic
- ...
Everything else should be familiar and easy to modify for experiments. I often end up stripping down implementations to the bare minimum and rewriting them for:
- Configurable experiment management
- Effective logging (preferably free)
- Seamless multi-GPU support
Yes, there are other well-structured templates, but:
- They are over-engineered → Hard to modify, too much boilerplate
- They impose strict frameworks → Require learning Lightning or other abstractions
- They add unnecessary complexity → I just need a simple, adaptable structure
What I need is simple:
✅ Run multiple experiments with different models & settings
✅ Quickly switch configurations
✅ Track experiments efficiently
✅ Easy inference with saved settings
✅ Monitor training behavior with intuitisve logging
✅ No CUDA/CPU/hardware headaches
- Configuration Management → Hierarchical configs with Hydra
- Experiment Tracking → Auto-save & load experiment settings
- Logging → Console & file logging with Loguru
- Visualization → TensorBoard support for metrics, images, and models
- Hardware Agnostic → Lightning Fabric (better flexibility than PyTorch Lightning)
- Lean & Adaptable → No unnecessary overhead, quick to modify
This template is designed to keep things simple, flexible, and experiment-focused—without unnecessary complexity.
Use this mainly for the config management and logging features. A toy example of a reconstruction autoencoder for a random image to show it works and show where the dataset, model, loss optimizers, training/validation/inference logic, vis, io, other utils could go. You're in control.
├── configs/ # Configuration files
│ ├── config.yaml # Main configuration
│ ├── data/ # Dataset configurations
│ ├── experiment/ # Experiment configurations
│ ├── model/ # Model configurations
│ └── training/ # Training configurations
├── model_save/ # Saved models and experiment data
├── src/ # Source code
│ ├── data/ # Dataset implementations
│ ├── model/ # Model implementations
│ ├── utils/ # Utility functions
│ │ ├── logging/ # Logging utilities
│ │ │ ├── msg_logger.py # Message logging with Loguru
│ │ │ └── tb_logger.py # TensorBoard logging
│ │ ├── io.py # I/O utilities
│ │ ├── utils.py # General utilities
│ │ └── visualization.py # Visualization utilities
│ ├── infer.py # Inference script
│ └── train.py # Training script
└── requirements.txt # Dependencies
Set up your own environment, but to you need atleast mentioned in the provided requirements.txt file to use the features here and
run the toy example (adapt the cuda version or just use your own installation proc):
conda create -n minimal python=3.10
conda activate minimal
pip install --force-reinstall -r requirements.txtTo train a model with the default configuration:
python src/train.pyThis will:
- Create an experiment directory in
model_save/with the experiment name<exp_name> - Save the configuration used for training in
model_save/<exp_name>/.hydra(for reference and used for resuming exp or inference) - Log messages to both console and a log file
model_save/<exp_name>/train.login the experiment directory - Log metrics and visualizations to TensorBoard in
model_save/<exp_name>/tb
Key CLI Overrides: You can override any configuration parameter from the command line:
# Change run params
python src/train.py experiment.exp_name=my_experiment training.epochs=100 training.batch_size=64
# Change model architecture
python src/train.py model=complex
# Multi-GPU training
python src/train.py training.devices=2 training.accelerator="gpu"
# Mixed precision
python src/train.py training.precision="16-mixed"python src/infer.py --experiment <exp_name>- Loads config from original training run
- Saves predictions to
model_save/<exp_name>/preds
Optionally, you can also use CLI to over-ride params during inference (needed sometimes)
python src/infer.py --experiment <exp_name> data.image_type=tif-
Hierarchical Configs
Compose configurations from multiple files:# configs/config.yaml defaults: - experiment: default - training: default - model: base # or complex (see configs/model) - data: default
-
Experiment-Specific Settings
# configs/experiment/default.yaml exp_name: "unet_baseline" resume: false
-
Model Zoo
Switch architectures via config:# configs/model/base.yaml _target_: src.model.base.UNet in_channels: 1 out_channels: 1 initial_features: 64
-
Training Settings
epochs: 500 lr: 0.0001 ckpt_frequency: 50 # Save checkpoint every 5 epochs resume_from_last: false # Path to checkpoint to resume from accelerator: "auto" # Lightning Fabric: "cpu", "gpu", "tpu", or "auto" devices: "auto" # Number of devices (e.g., 2 for 2 GPUs) precision: "32-true" # Mixed precision: "16-mixed", "bf16-mixed", or "32-true" logging: epoch_frequency: 1 image_frequency: 50 # Log images every epoch
To create a new configuration, add a YAML file to the appropriate directory:
- For a new model:
configs/model/my_model.yaml - For a new dataset:
configs/data/my_dataset.yaml - For a new training setup:
configs/training/my_training.yaml
Then update the default yaml
# configs/config.yaml
defaults:
- experiment: my_experiment
- training: my_training
- model: my_model
- data: my_datasetOR use it from CLI with:
python src/train.py model=my_model data=my_dataset training=my_training experiment=my_experimentThe template uses Loguru for message logging. The flow is simple. Look at src/train.py. Here is a snippet, in your
entry point:
import hydra
from src.utils.logging.msg_logger import setup_logging
# Set up msg logger
hydra_cfg = hydra.core.hydra_config.HydraConfig.get()
setup_logging(exp_dir=hydra_cfg.run.dir, log_filename=hydra_cfg.job.name)Then in any file/module/function/class, do
from loguru import logger
epoch=0
loss=1
logger.info(f"Epoch {epoch} loss: {loss:.4f}")
logger.info("Starting training process")
logger.warning("GPU memory is running low")
logger.error("Failed to load dataset")
something = `something`
logger.opt(colors=True).info("<blue>Using color to highlight(s):</blue> <green>{}</green>", something)Logs are saved to both the console and a log file in the experiment directory.
The template includes a TensorBoard logger for visualizing metrics and images:
from src.utils.logging.tb_logger import TensorBoardLogger
tb_logger = TensorBoardLogger(tb_dir)
# Log a scalar value
tb_logger.log_scalar("training/loss", loss_value, step)
# Log images
tb_logger.log_images("training/generated_images", sample_images, step)
# Log model graph
tb_logger.log_model_graph(model, dummy_input)To view TensorBoard logs:
tensorboard --logdir model_save/my_experiment/tbImplemented Components:
- UNet architecture with configurable depth/features
- Random image dataset (template for easy replacement)
- MSE loss autoencoder training
- Reconstruction visualization utilities
This template is designed to be extended for your specific needs:
- Implement model in
src/model/ - Create config in
configs/model/ - Update main config:
defaults: - model: your_model # in configs/config.yaml
- Implement dataset class in
src/data/ - Update data config:
# configs/data/default.yaml data: name: "your_dataset" image_size: 256
- Edit
src/train.pycore loop - Add new metrics/visualizations
- Extend logging as needed
- Make use of the core features of config management and logging for deep learning exps
- Look at the lightweight toy example for what the workflow could be
- Easy (relatively easy/easier) to open it and adapt for your use case (hopefully)
MIT License
Copyright (c) 2021 ashleve
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
