YSDA RecSys 2025 Lavka Solution

This repository contains a solution for the Kaggle competition YSDA RecSys 2025 Lavka. The goal is to predict user-item interactions for a grocery recommendation system.

Getting started

# Download competition data
kaggle competitions download -c ysda-recsys-2025-lavka
unzip ysda-recsys-2025-lavka.zip

# Install dependencies
pip install -r requirements.txt

# Prepare daily features
python precalc_features.py

# Run training
python trainer.py

# Optimizes model params automatically
python optimize.py

Key Components

1. `calculate_stats.py`

Generates collaborative and statistical features:
- Collaborative filtering: NPMI, Jaccard similarity, SVD embeddings
- Log1p-counts for actions (views, purchases) grouped by user/product/store/city
- Click-through rate (CTR) features
- Temporal features (hour, weekday, time_of_day, etc.)

2. `Featurizer`

Computes features in causal mode (no future data leakage)
Stores daily feature snapshots for efficient joins
Uses sliding windows (3/7/30/all days) for time-decay modeling

3. `Preprocessor`

Handles competition-specific rules:
- Makes train/val split
- Filters small request_ids

4. `Trainer`

Supports three training modes:
- Random lag: Random historical offset (1-30 days) for the each sample
- Constant lag: Fixed historical offset (e.g., 7 days)
- Ensemble: Combines random + all constant lags
Uses CatBoost:
- CatBoostClassifier for binary classification (view/purchase)
- CatBoostRanker with pairwise NDCG optimization
Implements smart caching for fast iteration

5. Hyperparameter Optimization

optimize.py: Uses Optuna for GPU-accelerated parameter tuning

6. Metrics & Submission

Calculates NDCG@10 for ranking evaluation
Handles ensemble result merging
Produces Kaggle-ready CSV submissions

Key Implementation Details

1. Causal Feature Engineering

All features are computed per day with no future data leakage
Windows handle time series changes

2. Ensemble Strategy

Combines 31 models (random lag + 30 constant lags)
Uses day-specific weighting for constant lags

3. Efficiency Features

Parquet caching for fast feature reuse
GPU acceleration for CatBoost and collaborative features

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
calculate_stats.py		calculate_stats.py
metrics.py		metrics.py
optimize.py		optimize.py
precalc_features.py		precalc_features.py
preprocess.py		preprocess.py
requirements.txt		requirements.txt
trainer.py		trainer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

YSDA RecSys 2025 Lavka Solution

Getting started

Key Components

1. `calculate_stats.py`

2. `Featurizer`

3. `Preprocessor`

4. `Trainer`

5. Hyperparameter Optimization

6. Metrics & Submission

Key Implementation Details

1. Causal Feature Engineering

2. Ensemble Strategy

3. Efficiency Features

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

YSDA RecSys 2025 Lavka Solution

Getting started

Key Components

1. calculate_stats.py

2. Featurizer

3. Preprocessor

4. Trainer

5. Hyperparameter Optimization

6. Metrics & Submission

Key Implementation Details

1. Causal Feature Engineering

2. Ensemble Strategy

3. Efficiency Features

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

1. `calculate_stats.py`

2. `Featurizer`

3. `Preprocessor`

4. `Trainer`

Packages