Skip to content

End-to-end Two-Stage Recommendation Architecture (Two-Tower Retrieval + CatBoost Ranking) for H&M Personalized Fashion. Engineered as a Python package..

Notifications You must be signed in to change notification settings

JF-Andrade/two-tower_recommendation

Repository files navigation

Two-Tower Recommendation System

License: MIT Python TensorFlow MLflow

Project Context Implementation of a Two-Stage Recommendation Architecture (Two-Tower Retrieval + CatBoost Ranking), engineered as an installable Python package. The workflow is managed by a custom orchestrator (run_pipeline.py) that enforces strict time-based data splitting, reproducibility, and experiment tracking via MLflow.

Project Overview

"Act in the entire ML lifecycle: from mathematical conception and model architecture, feature engineering and experimentation, to the implementation of a robust 'production-grade' MLOps pipeline."

This project implements a Two-Stage Recommendation Architecture for the H&M Personalized Fashion Recommendations challenge, combining a Neural Retrieval stage (Two-Tower) with a Gradient Boosting Ranking stage (CatBoost).

The solution is structured as a production-ready Python package, utilizing a modular pipeline that automates data processing, training, evaluation, and artifact management.

Key Technical Philosophy:

  1. Architecture-First: Implementation of a standard RecSys pattern (Retrieval + Ranking) rather than ad-hoc scripts.
  2. MLOps Orchestration: Centralized control via run_pipeline.py with full MLflow integration for experiment tracking.
  3. Local Reproducibility: Dependency management via uv and removal of cloud-specific dependencies to ensure varying environments can reproduce results.

Key Technical Features

1. Model Architecture

  • Two-Stage Recommendation System:
    • Stage 1 (Retrieval): Neural Dual Encoder (Two-Tower) built with TensorFlow Recommenders (TFRS) to map users and items into a shared 32D embedding space. Generates top-K candidates via efficient BruteForce similarity search.
    • Stage 2 (Ranking): CatBoost Classifier trained to re-rank the retrieved candidates using dense behavioral features and item metadata.
  • Feature Engineering: Strict separation of static and dynamic features, including calculated metrics like purchase_cycle and price_sensitivity.
  • Deep Feature Engineering: Embeddings + behavioral features (Category Affinity, Price Sensitivity, Tenure).
  • Hyperparameter Tuning: Optuna integration for CatBoost with MLflow tracking.

2. MLOps Engineering

  • Pipeline Orchestrator: A custom Python script (scripts/run_pipeline.py) manages the execution DAG, ensuring correct dependency order (Preprocess -> Train -> Rank -> Evaluate).
  • Experiment Tracking (MLflow):
    • Nested Runs: Hierarchical tracking of pipeline steps.
    • Artifact Management: Storage of serialized models, scalers, and metric plots.
    • Metric Logging: Tracking of MAP@12 at both Retrieval and Ranking stages.
  • Reproducibility: Strictly pinned dependencies via uv.lock and config-driven parameterization.

3. Evaluation Strategy

  • Time-Based Split: Strict temporal separation for Training (365 days), Fine-Tuning (30 days) and Validation (7 days) to mimic production forecasting and prevent data leakage.
  • Incremental Benchmarking: Evaluation of each stage independently (Baseline vs. Retrieval vs. Final Ranking).
  • Interpretability: SHAP analysis applied to the Ranker to explain feature importance.

4. Testing & Verification

  • Smoke Tests (tests/fast_test.py): Fast execution checks for model compilation and pipeline integrity.
  • Logic Validation (tests/verify_features.py): Verification of feature engineering logic to guarantee handling of temporal constraints.
  • Model Inspection (tests/inspect_model.py): Utilities to validate input signatures and saved model artifacts.

Project Structure

The repository follows a modular "src-layout" pattern:

.
├── data/                     # Data lake (Raw CSVs & Processed Parquet)
├── scripts/                  # Controller Layer (Imperative Shell)
│   ├── run_pipeline.py       # MAIN ENTRY POINT (Orchestrator)
│   ├── train.py              # Training logic
│   └── ...
├── src/                      # Service Layer (Functional Core)
│   ├── model.py              # TFRS Two-Tower Model Architecture
│   ├── data_utils.py         # tf.data pipelines & Preprocessing
│   └── config.py             # Single Source of Truth for Configs
├── mlruns/                   # Local MLflow Tracking Store
├── docs/                     # Additional Documentation
│   ├── KAGGLE_LEARNINGS.md   # Benchmarking & Strategy
│   └── FINAL_RESULTS.md      # Methodologies & Results
├── pyproject.toml            # Project Dependencies (uv managed)
└── uv.lock                   # Exact Dependency Lockfile

Note: This project has been fully refactored for Local Execution. All Cloud/GCP dependencies were removed to ensure cost-effective, high-performance local training.

Getting Started

Prerequisites

  • Python 3.8+
  • uv (Fast Python package installer)

Installation

  1. Clone and Setup Environment:

    # Install uv if not present
    pip install uv
    
    # Create virtual environment
    uv venv
    
    # Activate (Windows)
    .venv\Scripts\Activate.ps1
    
    # Install dependencies
    uv pip install -e .
  2. Prepare Data: Place the H&M competition CSV files in data/ and run:

    python scripts/convert_csv_to_parquet.py

Running the Pipeline

Execute the full end-to-end pipeline with a single command:

python scripts/run_pipeline.py

Pipeline Steps

Step Command Description
1. Preprocess preprocess Partitions data into Training and Validation sets using strict temporal splitting logic to prevent data leakage.
2. TFRecord tfrecord Transforms processed Parquet files into optimized TFRecord format to maximize GPU throughput.
3. Baseline baseline Establishes a performance benchmark (MAP@12) using a simple 'Most Popular' heuristic strategy.
4. Train train Executes a two-phase training strategy: Base Training on 365 days of history followed by Fine-Tuning on recent data to adapt to shifting trends.
5. Evaluate-TT evaluate-tt Measures the retrieval quality (MAP@12) of the Two-Tower model in isolation against the validation set.
6. Candidates candidates Performs efficient similarity search to generate the top-K candidate items for each user.
7. Tune tune Executes Optuna Bayesian optimization to find the best hyperparameters for the CatBoost ranker.
8. Ranking ranking Trains a CatBoost classifier to re-rank the candidate list based on fine-grained interaction probabilities.
9. Evaluate evaluate Computes the final MAP@12 of the integrated system (Retrieval + Ranking) on the validation set.
10. SHAP shap Runs SHAP (SHapley Additive exPlanations) to analyze feature contributions to the Ranker's predictions.
11. Submission submission Generates the specialized submission file formatted for the Kaggle competition leaderboard.
# Run specific steps only
python scripts/run_pipeline.py --steps train candidates ranking evaluate

# Run all steps including tuning
python scripts/run_pipeline.py --steps all

# Skip tuning (use saved or default params)
python scripts/run_pipeline.py --skip-tuning

Why use the pipeline?

  • Reproducibility: Ensures steps run in the correct order.
  • Tracking: Automatically logs all params, metrics, and artifacts to MLflow (Nested Runs).
  • Incremental Evaluation: Compares MAP@12 across stages (Baseline → Two-Tower → 2-Stage).

To visualize experiments, launch the MLflow UI:

mlflow ui

Recovery Mode

If training completes but the model fails to save (or you need to re-save with a new signature), use the recovery script to avoid re-training:

python scripts/recover_model.py

This script loads the last best checkpoint, re-indexes the candidates, and saves the final model artifact.

Author

Jordão Fernandes de Andrade Data Scientist & Economist (MSc) [email protected]


This project is licensed under the MIT License.

About

End-to-end Two-Stage Recommendation Architecture (Two-Tower Retrieval + CatBoost Ranking) for H&M Personalized Fashion. Engineered as a Python package..

Topics

Resources

Stars

Watchers

Forks

Languages