Two-Tower Recommendation System

Project Context Implementation of a Two-Stage Recommendation Architecture (Two-Tower Retrieval + CatBoost Ranking), engineered as an installable Python package. The workflow is managed by a custom orchestrator (run_pipeline.py) that enforces strict time-based data splitting, reproducibility, and experiment tracking via MLflow.

Project Overview

"Act in the entire ML lifecycle: from mathematical conception and model architecture, feature engineering and experimentation, to the implementation of a robust 'production-grade' MLOps pipeline."

This project implements a Two-Stage Recommendation Architecture for the H&M Personalized Fashion Recommendations challenge, combining a Neural Retrieval stage (Two-Tower) with a Gradient Boosting Ranking stage (CatBoost).

The solution is structured as a production-ready Python package, utilizing a modular pipeline that automates data processing, training, evaluation, and artifact management.

Key Technical Philosophy:

Architecture-First: Implementation of a standard RecSys pattern (Retrieval + Ranking) rather than ad-hoc scripts.
MLOps Orchestration: Centralized control via run_pipeline.py with full MLflow integration for experiment tracking.
Local Reproducibility: Dependency management via uv and removal of cloud-specific dependencies to ensure varying environments can reproduce results.

Key Technical Features

1. Model Architecture

Two-Stage Recommendation System:
- Stage 1 (Retrieval): Neural Dual Encoder (Two-Tower) built with TensorFlow Recommenders (TFRS) to map users and items into a shared 32D embedding space. Generates top-K candidates via efficient BruteForce similarity search.
- Stage 2 (Ranking): CatBoost Classifier trained to re-rank the retrieved candidates using dense behavioral features and item metadata.
Feature Engineering: Strict separation of static and dynamic features, including calculated metrics like purchase_cycle and price_sensitivity.
Deep Feature Engineering: Embeddings + behavioral features (Category Affinity, Price Sensitivity, Tenure).
Hyperparameter Tuning: Optuna integration for CatBoost with MLflow tracking.

2. MLOps Engineering

Pipeline Orchestrator: A custom Python script (scripts/run_pipeline.py) manages the execution DAG, ensuring correct dependency order (Preprocess -> Train -> Rank -> Evaluate).
Experiment Tracking (MLflow):
- Nested Runs: Hierarchical tracking of pipeline steps.
- Artifact Management: Storage of serialized models, scalers, and metric plots.
- Metric Logging: Tracking of MAP@12 at both Retrieval and Ranking stages.
Reproducibility: Strictly pinned dependencies via uv.lock and config-driven parameterization.

3. Evaluation Strategy

Time-Based Split: Strict temporal separation for Training (365 days), Fine-Tuning (30 days) and Validation (7 days) to mimic production forecasting and prevent data leakage.
Incremental Benchmarking: Evaluation of each stage independently (Baseline vs. Retrieval vs. Final Ranking).
Interpretability: SHAP analysis applied to the Ranker to explain feature importance.

4. Testing & Verification

Smoke Tests (tests/fast_test.py): Fast execution checks for model compilation and pipeline integrity.
Logic Validation (tests/verify_features.py): Verification of feature engineering logic to guarantee handling of temporal constraints.
Model Inspection (tests/inspect_model.py): Utilities to validate input signatures and saved model artifacts.

Project Structure

The repository follows a modular "src-layout" pattern:

.
├── data/                     # Data lake (Raw CSVs & Processed Parquet)
├── scripts/                  # Controller Layer (Imperative Shell)
│   ├── run_pipeline.py       # MAIN ENTRY POINT (Orchestrator)
│   ├── train.py              # Training logic
│   └── ...
├── src/                      # Service Layer (Functional Core)
│   ├── model.py              # TFRS Two-Tower Model Architecture
│   ├── data_utils.py         # tf.data pipelines & Preprocessing
│   └── config.py             # Single Source of Truth for Configs
├── mlruns/                   # Local MLflow Tracking Store
├── docs/                     # Additional Documentation
│   ├── KAGGLE_LEARNINGS.md   # Benchmarking & Strategy
│   └── FINAL_RESULTS.md      # Methodologies & Results
├── pyproject.toml            # Project Dependencies (uv managed)
└── uv.lock                   # Exact Dependency Lockfile

Note: This project has been fully refactored for Local Execution. All Cloud/GCP dependencies were removed to ensure cost-effective, high-performance local training.

Getting Started

Prerequisites

Python 3.8+
uv (Fast Python package installer)

Installation

Clone and Setup Environment:

# Install uv if not present
pip install uv

# Create virtual environment
uv venv

# Activate (Windows)
.venv\Scripts\Activate.ps1

# Install dependencies
uv pip install -e .

Prepare Data: Place the H&M competition CSV files in data/ and run:
```
python scripts/convert_csv_to_parquet.py
```

Running the Pipeline

Execute the full end-to-end pipeline with a single command:

python scripts/run_pipeline.py

Pipeline Steps

Step	Command	Description
1. Preprocess	`preprocess`	Partitions data into Training and Validation sets using strict temporal splitting logic to prevent data leakage.
2. TFRecord	`tfrecord`	Transforms processed Parquet files into optimized TFRecord format to maximize GPU throughput.
3. Baseline	`baseline`	Establishes a performance benchmark (MAP@12) using a simple 'Most Popular' heuristic strategy.
4. Train	`train`	Executes a two-phase training strategy: Base Training on 365 days of history followed by Fine-Tuning on recent data to adapt to shifting trends.
5. Evaluate-TT	`evaluate-tt`	Measures the retrieval quality (MAP@12) of the Two-Tower model in isolation against the validation set.
6. Candidates	`candidates`	Performs efficient similarity search to generate the top-K candidate items for each user.
7. Tune	`tune`	Executes Optuna Bayesian optimization to find the best hyperparameters for the CatBoost ranker.
8. Ranking	`ranking`	Trains a CatBoost classifier to re-rank the candidate list based on fine-grained interaction probabilities.
9. Evaluate	`evaluate`	Computes the final MAP@12 of the integrated system (Retrieval + Ranking) on the validation set.
10. SHAP	`shap`	Runs SHAP (SHapley Additive exPlanations) to analyze feature contributions to the Ranker's predictions.
11. Submission	`submission`	Generates the specialized submission file formatted for the Kaggle competition leaderboard.

# Run specific steps only
python scripts/run_pipeline.py --steps train candidates ranking evaluate

# Run all steps including tuning
python scripts/run_pipeline.py --steps all

# Skip tuning (use saved or default params)
python scripts/run_pipeline.py --skip-tuning

Why use the pipeline?

Reproducibility: Ensures steps run in the correct order.
Tracking: Automatically logs all params, metrics, and artifacts to MLflow (Nested Runs).
Incremental Evaluation: Compares MAP@12 across stages (Baseline → Two-Tower → 2-Stage).

To visualize experiments, launch the MLflow UI:

mlflow ui

Recovery Mode

If training completes but the model fails to save (or you need to re-save with a new signature), use the recovery script to avoid re-training:

python scripts/recover_model.py

This script loads the last best checkpoint, re-indexes the candidates, and saves the final model artifact.

Author

Jordão Fernandes de Andrade Data Scientist & Economist (MSc) [email protected]

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
catboost_info		catboost_info
docs		docs
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
.python-version		.python-version
.ruff.toml		.ruff.toml
DATA_FLOW.md		DATA_FLOW.md
FEATURES_REPORT.md		FEATURES_REPORT.md
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Two-Tower Recommendation System

Project Overview

Key Technical Features

1. Model Architecture

2. MLOps Engineering

3. Evaluation Strategy

4. Testing & Verification

Project Structure

Getting Started

Prerequisites

Installation

Running the Pipeline

Pipeline Steps

Recovery Mode

Author

About

Uh oh!

Languages

JF-Andrade/two-tower_recommendation

Folders and files

Latest commit

History

Repository files navigation

Two-Tower Recommendation System

Project Overview

Key Technical Features

1. Model Architecture

2. MLOps Engineering

3. Evaluation Strategy

4. Testing & Verification

Project Structure

Getting Started

Prerequisites

Installation

Running the Pipeline

Pipeline Steps

Recovery Mode

Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages