This repository contains a solution for the Kaggle competition YSDA RecSys 2025 Lavka. The goal is to predict user-item interactions for a grocery recommendation system.
# Download competition data
kaggle competitions download -c ysda-recsys-2025-lavka
unzip ysda-recsys-2025-lavka.zip
# Install dependencies
pip install -r requirements.txt
# Prepare daily features
python precalc_features.py
# Run training
python trainer.py
# Optimizes model params automatically
python optimize.py- Generates collaborative and statistical features:
- Collaborative filtering: NPMI, Jaccard similarity, SVD embeddings
- Log1p-counts for actions (views, purchases) grouped by user/product/store/city
- Click-through rate (CTR) features
- Temporal features (hour, weekday, time_of_day, etc.)
- Computes features in causal mode (no future data leakage)
- Stores daily feature snapshots for efficient joins
- Uses sliding windows (3/7/30/all days) for time-decay modeling
- Handles competition-specific rules:
- Makes train/val split
- Filters small request_ids
- Supports three training modes:
- Random lag: Random historical offset (1-30 days) for the each sample
- Constant lag: Fixed historical offset (e.g., 7 days)
- Ensemble: Combines random + all constant lags
- Uses CatBoost:
CatBoostClassifierfor binary classification (view/purchase)CatBoostRankerwith pairwise NDCG optimization
- Implements smart caching for fast iteration
optimize.py: Uses Optuna for GPU-accelerated parameter tuning
- Calculates NDCG@10 for ranking evaluation
- Handles ensemble result merging
- Produces Kaggle-ready CSV submissions
- All features are computed per day with no future data leakage
- Windows handle time series changes
- Combines 31 models (random lag + 30 constant lags)
- Uses day-specific weighting for constant lags
- Parquet caching for fast feature reuse
- GPU acceleration for CatBoost and collaborative features