Time Series Research Lab

Applied time series forecasting and anomaly detection using ML, statistical baselines, and residual analysis.

⭐ If this repository helps you, please consider starring it and crediting the source.

Overview

This repository demonstrates an applied research workflow for time series forecasting and anomaly detection on operational / consumer-style data.

The focus is not on academic perfection, but on:

data-centric modeling
method comparison
residual-based anomaly detection
production-oriented execution

The project mirrors how forecasting and anomaly detection systems are typically built and evaluated in real enterprise analytics environments.

Key Capabilities

Feature-based time series forecasting (ML + baselines)
Residual-driven anomaly detection
Statistical and unsupervised ML methods
Time-aware train/test splitting
Reproducible, notebook-free pipeline execution
Clear separation between research and reusable logic

Repository Structure

timeseries-research-lab/
├── data/
│   ├── synthetic_kpi.csv          # Synthetic time series dataset
│   ├── feature_table.parquet      # Generated feature table
│   └── README.md                  # Dataset documentation
│
├── notebooks/
│   ├── 01_data_exploration.ipynb
│   ├── 02_feature_engineering.ipynb
│   ├── 03_forecasting_models.ipynb
│   ├── 04_anomaly_detection.ipynb
│   └── 05_model_comparison.ipynb
│
├── src/
│   ├── features.py                # Feature engineering utilities
│   ├── forecasting.py             # Forecasting models and helpers
│   ├── anomaly.py                 # Anomaly detection logic
│   └── metrics.py                 # Evaluation metrics
│
├── scripts/
│   └── run_pipeline.py             # End-to-end executable pipeline
│
├── experiments/                    # Generated metrics
│   ├── forecast_metrics.csv
│   └── anomaly_metrics.csv
│
├── pyproject.toml                  # Environment & dependency definition
├── uv.lock                         # Locked, reproducible environment
└── README.md

Dataset

The dataset (synthetic_kpi.csv) simulates aggregated operational / consumer KPIs, similar to:

sales or demand indices
market activity signals
engagement or performance metrics

Characteristics

Daily frequency
Long-term trend + weekly seasonality
External effects (promotion-like signal)
Noise and drift
Injected anomalies (for benchmarking only)

⚠️ All data is synthetic and anonymized.
Anomaly labels are artificial and used only for evaluation.

See data/README.md for details.

Modeling Approach

Forecasting

Naive lag-based baseline
Random Forest regression
Histogram-based Gradient Boosting
Feature-driven modeling using lags, rolling statistics, and calendar features

Anomaly Detection

Residual-based statistical detection (rolling z-score)
Unsupervised ML (Isolation Forest)
Emphasis on stability and false-positive control

Forecasting and anomaly detection are treated as coupled problems, where anomalies are identified as deviations from expected behavior.

Notebooks (Research & Analysis)

The notebooks document the applied research process:

Exploration – trends, seasonality, external effects
Feature Engineering – lags, rolling windows, calendar signals
Forecasting Models – baselines vs ML models
Anomaly Detection – statistical and ML approaches
Comparison – accuracy, stability, and trade-offs

Notebooks are intended for analysis and explanation, not production execution.

Pipeline Execution (Production-Style)

In addition to notebooks, the repository includes a fully runnable pipeline that executes the complete workflow end-to-end without notebooks.

Environment setup

uv sync --extra dev

Run the pipeline

From the repository root:

python -m scripts.run_pipeline

Generated outputs

Running the pipeline produces:

data/feature_table.parquet
data/forecast_predictions.parquet
data/anomaly_outputs.parquet
experiments/forecast_metrics.csv
experiments/anomaly_metrics.csv

All outputs are reproducible and excluded from version control.

Attribution & Credit

If you use this repository — including its ideas, code, structure, or pipeline — in research, blogs, demos, internal tools, or production systems, please credit:

Sangam Kumar Singh
GitHub: https://github.com/SangiSI
Repository: https://github.com/SangiSI/timeseries-research-lab

This project is released under the MIT License, which requires attribution.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Time Series Research Lab

Overview

Key Capabilities

Repository Structure

Dataset

Modeling Approach

Forecasting

Anomaly Detection

Notebooks (Research & Analysis)

Pipeline Execution (Production-Style)

Environment setup

Run the pipeline

Generated outputs

Attribution & Credit

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
.github/workflows		.github/workflows
data		data
experiments		experiments
notebooks		notebooks
scripts		scripts
src		src
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

Time Series Research Lab

Overview

Key Capabilities

Repository Structure

Dataset

Modeling Approach

Forecasting

Anomaly Detection

Notebooks (Research & Analysis)

Pipeline Execution (Production-Style)

Environment setup

Run the pipeline

Generated outputs

Attribution & Credit

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages