ml-training-base
is a Python package providing base classes and utilities for machine learning projects. Currently, the package only supports supervised learning.
It includes:
- A configurable logging setup for both console and file outputs.
- Base classes for data loaders (
BaseSupervisedDataLoader
). - An environment setup class for deterministic training (
TrainingEnvironment
), ensuring reproducible runs. - A base trainer class (
BaseSupervisedTrainer
) that outlines a typical training workflow in supervised learning.
By using these abstractions, you can quickly spin up a new ML pipeline with consistent structure and easily extend or override specific components to suit your needs.
- Reusable Base Classes: Standard building blocks for data loading, training, callbacks, and environment management.
- Logging Utilities: Automatically configure logging to both console and file, with customizable logging paths.
- Deterministic Environment Setup: Control Python, NumPy, and TensorFlow seeds for reproducible ML experiments.
- Clear Project Structure: Easily extend or override abstract methods in your own data loaders, trainers, or environment logic.
You can install this package locally via:
pip install ml-training-base
- Install the package and its dependencies.
- Create a YAML configuration file (e.g.
config.yaml
) with your environment, logging, and data settings. - Import the classes in your script or Jupyter notebook:
import logging
from ml_training_base.data.utils.logging_utils import cconfigure_single_level_logger
from ml_training_base.supervised.environments.base_training_environments import KerasTrainingEnvironment
from ml_training_base.supervised.trainers.base_supervised_trainers import BaseSupervisedTrainer
- Set up your environment and trainer:
# For example, a custom trainer that inherits from BaseSupervisedTrainer
class MyCustomTrainer(BaseSupervisedTrainer):
def _setup_model(self):
# Initialize your model here, e.g., a TensorFlow/Keras or PyTorch model
pass
def _build_model(self):
# Compile or build your model
pass
def _setup_callbacks(self):
# Setup your training callbacks, checkpointing, etc.
pass
def _train(self):
# Implement your training loop or model.fit(...) call
pass
def _save_model(self):
# Save trained model to disk
pass
def _evaluate(self):
# Evaluate your model on the test set
pass
# Usage:
trainer = MyCustomTrainer(
config_path="path/to/config.yaml",
training_env=KerasTrainingEnvironment(logger=logging.getLogger(__name__))
)
trainer.run()
ml-training-base/
├── pyproject.toml
├── src/
│ └── ml_training_base/
│ ├── __init__.py
│ ├── data/
│ │ └── preprocessing/
│ │ ├── __init__.py
│ │ ├── configure_utils.py
│ │ ├── files_utils.py
│ │ └── logging_utils.py
│ ├── supervised/
│ │ ├── __init__.py
│ │ ├── data/
│ │ │ ├── __init__.py
│ │ │ └── base_supervised_data_loader.py
│ │ ├── environments/
│ │ │ ├── __init__.py
│ │ │ └── base_training_environments.py
│ │ ├── trainers/
│ │ │ ├── __init__.py
│ │ │ └── base_supervised_trainers.py
│ │ └── utils/
│ │ └── data/
│ │ ├── __init__.py
│ │ └── base_supervised_data_loader.py
│ └── utils/
│ ├── __init__.py
│ ├── configure_utils.py
│ ├── files_utils.py
│ └── logging_utils.py
├── tests/
│ ├── data/
│ │ └── preprocessing/
│ │ └── test_base_data_preprocessor.py
│ ├── supervised/
│ │ ├── data/
│ │ │ └── test_base_supervised_data_loader.py
│ │ ├── environments/
│ │ │ └── test_base_training_environments.py
│ │ └── trainers/
│ │ └── test_base_supervised_trainers.py
│ └── utils
│ ├── test_configure_utils.py
│ ├── test_files_utils.py
│ └── test_logging_utils.py
├── README.md
├── LICENSE
└── pyproject.toml
data/utils/logging_utils.py
:- Contains logger utilities, which sets up a standardized console and file logger for use throughout the package. File logger writes to a single file for all log levels.
configure_single_level_logger()
: File logger that writes to a single file for all log levels.configure_multi_level_logger
: File logger that writes to separate files for each log level.
supervised/environments/base_training_environments.py
:- Defines the
BaseEnvironment
abstract class for handling environment setup. - Provides concrete, framework-specific implementations like
KerasTrainingEnvironment
andPyTorchTrainingEnvironment
that manage deterministic setup (setting seeds, configuring hardware options, etc.).
- Defines the
supervised/trainers/base_supervised_trainers.py
:- Contains the core training framework hierarchy.
BaseSupervisedTrainer
: The framework-agnostic abstract class that defines the training pipeline (run()
,_setup_model()
,_train()
, etc.).BaseKerasSupervisedTrainer
&BasePyTorchSupervisedTrainer
: Framework-specific abstract classes that implement common boilerplate for Keras (model.fit()
) and PyTorch (manual training loop).
supervised/utils/data/base_supervised_data_loader.py
:- Contains the
BaseSupervisedDataLoader
abstract class. This defines the contract for creating data preparation pipelines (setup_datasets()
,get_train_dataset()
, etc.) that are used by the trainers.
- Contains the
You can define your runtime settings (e.g., logger paths, environment determinism seeds, model hyperparameters) in a YAML file.
For example:
# Data Configuration and Hyperparameters
data:
x_data_path: 'data/processed/x_data'
y_data_path: 'data/processed/y_data'
logger_path: 'var/log/training.log'
batch_size: 32
test_split: 0.1
validation_split: 0.1
# Model Configuration and Hyperparameters
model:
attention_dim: 512
encoder_embedding_dim: 512
decoder_embedding_dim: 512
units: 512
encoder_num_layers: 2
decoder_num_layers: 4
# Training Configuration and Hyperparameters
training:
epochs: 100
early_stop_patience: 5
weight_decay: null
dropout_rate: 0.2
learning_rate: 1e-4
# Environment Configuration
env:
determinism:
python_seed: "44478977"
random_seed: 440651
numpy_seed: 110789
tf_seed: 61592
This project is licensed under the terms of the MIT License. Feel free to copy, modify, and distribute per its terms.