PARS (Partial-Label-Learning-Inspired Recommender Systems) is a novel approach that utilizes partial label learning with transformer-based sequence modeling for learning conversion probabilities (or item purchase probabilities) from browsing to transaction without directly giving supervision of item acquisition. This method addresses the challenge of implicit feedback in recommender systems by treating user interactions as partial labels and progressively refining them during training.
- Partial Label Learning: Implements the partial label learning method for handling ambiguous user feedback
- Transformer-based Architecture: Utilizes BERT-style transformers for sequence modeling
- Masked Language Modeling (MLM): Self-supervised learning for better item representations
- Global Representation Learning: Generates user/session representations for efficient recommendation
- Multi-task Learning: Combines MLM and partial label learning objectives
Input Sequences → Item Embeddings + Position Embeddings
↓
Transformer Encoder
↓
┌──────────┴──────────┐
↓ ↓
MLM Head Global Projection
↓ ↓
MLM Loss Item Scores → PLL Loss
- Python 3.8+
- PyTorch 1.10+
- CUDA 11.0+ (for GPU support)
- Clone the repository:
git clone https://github.com/bjtu-lucas-nlp/PARS.git
cd PARS- Install dependencies:
pip install -r requirements.txtThe input data should be in CSV format with the following columns:
session_id: Session/user identifieritem_id_sequence: List of item IDs in the sequence (as string)label_sequence: Binary labels for each item (0 or 1)unique_item_sequence: Mask for unique items in sequence
Example:
session_id item_id_sequence label_sequence unique_item_sequence
0 "[1, 5, 3, 2]" "[0, 0, 1, 0]" "[1, 1, 1, 1]"Basic training command:
python pars_model.py \
--train_data_file datasets/Yoochoose/train.csv \
--val_data_file datasets/Yoochoose/val.csv \
--test_data_file datasets/Yoochoose/test.csv \
--num_items 39300 \
--num_sessions 417370 \
--epochs 300 \
--batch_size 256 \
--data_name Yoochoose \
--max_seq_len 50 \
--lr 1e-4 \
--mlm_weight 1.0 \
--hidden_size 256 \
--embedding_dim 128 \
--save_dir trainedmodel/Yoochoosepython pars_model.py \
--train_data_file PATH # Path to training data
--val_data_file PATH # Path to validation data
--test_data_file PATH # Path to test data
--sep_sym SEPARATOR # CSV separator (default: '\t')
--num_items NUM # Total number of items
--num_sessions NUM # Total number of sessions
--batch_size SIZE # Batch size (default: 256)
--data_name NAME # Dataset name for logging
--max_seq_len LENGTH # Maximum sequence length (default: 50)
--epochs NUM # Number of epochs (default: 10)
--lr RATE # Learning rate (default: 1e-4)
--mlm_weight WEIGHT # Weight for MLM loss (default: 1.0)
--hidden_size SIZE # Transformer hidden size (default: 256)
--embedding_dim DIM # Final embedding dimension (default: 128)
--save_dir DIR # Checkpoint directory (default: 'checkpoints')
--seed NUM # Random seed (default: 42)The model automatically evaluates on the test set during training and saves metrics including:
- AUC: Area Under the ROC Curve
- HR@K: Hit Rate at K
- NDCG@K: Normalized Discounted Cumulative Gain at K
- Precision@K: Precision at K
- Recall@K: Recall at K
- F1@K: F1 Score at K
- Item Embeddings: Learnable embeddings for each item
- Position Embeddings: Position-aware representations
- Transformer Encoder: Multi-layer self-attention mechanism
- Global Projection: Projects sequence representations to user/session embeddings
- MLM Head: Predicts masked items for self-supervised learning
-
Partial Label Learning (PLL):
- Uses PRODEN method to iteratively refine ambiguous labels
- Updates pseudo-labels based on model predictions
-
Masked Language Modeling (MLM):
- Randomly masks items in sequences
- Predicts masked items to learn item relationships
The training process generates:
checkpoints/best_model.pt: Best model based on validation NDCG@10checkpoints/PARS_*_test_metrics.csv: Test metrics for each epochcheckpoints/PARS_*_full_log.json: Complete training logcheckpoints/final_results.json: Final evaluation results
If you use PARS in your research, please cite:
@article{
title={PARS: Partial-Label-Learning-Inspired Recommender Systems},
}This project is licensed under the MIT License.
- PRODEN method for partial label learning
- Hugging Face Transformers library
- PyTorch community