This project builds a personalized news recommendation system using Graph Neural Networks (GNNs) to deliver tailored article suggestions by modeling the relationships between users, articles, and content metadata. By leveraging graph-based learning, the system captures complex interactions and semantic patterns beyond traditional collaborative or content-based methods.
- Goal: Recommend news articles aligned with individual user preferences.
- Challenge: Address dynamic user interests, content freshness, and the cold-start problem.
- Solution: Use GNNs to learn from a heterogeneous graph of users, articles, categories, and interactions.
- News Article Data: Includes ID, title, category, sub-category, and abstract.
- User Interaction Data: Tracks which articles were shown and clicked by users.
- User IDs: Active and cold-start users.
- News IDs: Individual articles.
- Categories/Sub-Categories: Classification metadata.
- News Articles: Title + abstract → Sentence Transformer embeddings (
all-MiniLM-L6-v2
). - Categories/Sub-Categories: Based on label names.
- Users:
- Active: Average embeddings of clicked articles.
- Cold-start: Average embeddings of popular articles.
Node ID
,Node Type
,Node Label
,Node Feature Embedding
- Category ↔ Sub-category
- Sub-category ↔ Article
- User ↔ Article (click interaction)
The resulting graph represents a multi-level, heterogeneous structure of user-content relationships for GNN-based learning.
- Graph Convolutional Network (GCN): Captures global graph structures via neighborhood aggregation.
- Graph Attention Network (GAT): Learns importance of neighbor nodes using attention weights.
- GraphSAGE: Scalable sampling-based aggregation, ideal for inductive learning and cold-start scenarios.
- 4-layer GNNs
- Link prediction task with positive (real) and negative (random) edges.
- Loss Function: Binary cross-entropy
- Epochs: 100
- Evaluation: On separate validation sets using both classification and ranking metrics.
- Average Precision (AP)
- Average Recall (AR)
- Average F1 Score
Model Type | Avg Precision | Avg Recall | F1 Score |
---|---|---|---|
Popularity-Based | 0.00008 | 0.00109 | 0.00016 |
Content-Based (Article) | 0.00018 | 0.00855 | 0.00034 |
Content-Based (User) | 0.00014 | 0.00098 | 0.00023 |
GCN | 0.048 | 0.79 | 0.091 |
GAT | 0.048 | 0.651 | 0.090 |
GraphSAGE | 0.041 | 0.485 | 0.075 |
📌 GCN achieves the best classification performance with a strong recall and balanced F1 score.
- NDCG@k (Normalized Discounted Cumulative Gain)
- MRR (Mean Reciprocal Rank)
Model | NDCG@5 | NDCG@10 | NDCG@15 | MRR |
---|---|---|---|---|
GCN | 0.627 | 0.674 | 0.707 | 1 |
GAT | 0.630 | 0.677 | 0.709 | 1 |
GraphSAGE | 0.624 | 0.672 | 0.705 | 1 |
All models rank highly relevant articles effectively, with GAT slightly outperforming others on NDCG.
- Graph-based modeling significantly outperforms popularity and traditional content-based methods.
- GCN is the most effective model for this task, with high recall and ranking accuracy.
- GAT performs comparably and may offer further advantages with more complex or weighted relationships.
- GraphSAGE, while slightly behind in metrics, brings scalability and support for inductive inference, especially useful in real-time applications.
- Python, PyTorch Geometric, NetworkX, scikit-learn, HuggingFace Transformers (SentenceTransformers)
- Jupyter Notebooks for experiments and evaluation
Abde Manaaf Ghadiali Final Project – Machine Learning (CSCI 6364)