A scalable e-commerce recommendation system built with Apache Spark, Delta Lake, and Neural Collaborative Filtering (NCF) for processing 10M+ user interactions.
- Scalable data processing with Apache Spark
- ACID transactions and time travel with Delta Lake
- Neural Collaborative Filtering for personalized recommendations
- Real-time and batch processing capabilities
- Model training and serving pipeline
- Monitoring and evaluation metrics
├── data/ # Data storage directory
├── notebooks/ # Jupyter notebooks for analysis
├── src/
│ ├── data/ # Data processing modules
│ ├── models/ # NCF model implementation
│ ├── training/ # Model training pipeline
│ └── serving/ # Model serving and inference
├── tests/ # Unit tests
├── config/ # Configuration files
└── requirements.txt # Python dependencies
- Python 3.8+
- Apache Spark 3.2+
- Delta Lake 2.0+
- PyTorch 1.9+
- Jupyter Notebook
- Create a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies:
pip install -r requirements.txt- Start Jupyter Notebook:
jupyter notebook- Data Processing:
python src/data/process_data.py- Model Training:
python src/training/train_model.py- Generate Recommendations:
python src/serving/generate_recommendations.pyMIT License