Bachelor Thesis Project by Malte Matthey
A modular research framework developed to systematically investigate the practical feasibility of Deep Reinforcement Learning (DRL) for Home Energy Management Systems (HEMS). This project evaluates multiple DRL algorithms (DQN, SAC, TD3, PPO) against classical rule-based controllers using synthesized photovoltaic (PV) data, imperfect weather forecasts, and dynamic day-ahead electricity prices.
The full text includes the mathematical MDP formulation, the custom environment architecture, and a critical evaluation of the agents.
📥 Download the Bachelor Thesis (PDF)
To cleanly separate the data lifecycle from the training logic, the project architecture is divided into three distinct modular repositories. As illustrated in the deployment architecture below, the data flows sequentially from local preprocessing, into a central data management server, and finally to the RL framework executed on an HPC cluster. The source code for each module can be found in the list below.
1. Data Preprocessing (Data Engineering)
The entry point of the pipeline, responsible for parsing and cleaning heterogeneous, file-based raw household consumption and PV generation data. It features a standardized, object-oriented workflow for parsing varying data formats (e.g., HDF5, CSV), missing-value interpolation, daylight saving time corrections, and strict JSON schema-based validation prior to downstream ingestion.
2. Data Management Service (Backend Engineering)
A containerized backend service that orchestrates the entire data unification and synthesis pipeline, acting as the single source of truth for the experimental datasets. Built with FastAPI and PostgreSQL extended with TimescaleDB for high-performance time-series management. The service handles automated API fetching with Backblaze B2 caching, multi-stage PV generation simulation (using pvlib), synthetic imperfect weather forecast generation via quantile regression (lightgbm), and dynamic LOCF (Last-Observation-Carried-Forward) SQL query building.
3. RL Agent Framework (Machine Learning & MLOps)
The core research environment used for training, hyperparameter tuning, and evaluating the agents. It includes the implementation of custom Gymnasium environments with efficient in-memory data sourcing and vectorization. The framework provides a declarative configuration system, time-aware k-fold cross-validation, automated HPO sweeps with a hybrid early-stopping strategy, and seamless integration with a High-Performance Computing (HPC) cluster via Slurm.
- Machine Learning: Stable Baselines3, sb3-contrib, Gymnasium, scikit-learn
- Data Processing & Simulation: Pandas, NumPy, pvlib, lightgbm, Apache Parquet
- Backend & Database: FastAPI, PostgreSQL, TimescaleDB, SQLAlchemy Core, asyncpg
- MLOps & Infrastructure: Weights & Biases (wandb), Docker, Docker Compose, GitLab CI/CD, Slurm Workload Manager, Backblaze B2

