A comprehensive RAG-based (Retrieval-Augmented Generation) product recommendation system that combines advanced machine learning techniques with modern web technologies to provide intelligent product recommendations.
- Vector Embeddings: Uses SentenceTransformers (
all-MiniLM-L6-v2) for semantic product understanding - FAISS Integration: Fast similarity search with Facebook AI Similarity Search
- Dual Recommendation Types:
- Similar Products: Based on semantic similarity using vector embeddings
- Complementary Products: Based on category relationships and business logic
- Full-Text Search: TF-IDF vectorization for text-based product search
- Multi-Format Support: Amazon JSON Lines, CSV files, compressed (.gz) files
- Automatic Column Mapping: Intelligent standardization of different data schemas
- Amazon Dataset Integration: Native support for Amazon product metadata and reviews
- Data Preprocessing: Handles missing values, creates combined text features
- React 18.2.0: Modern, responsive single-page application
- Real-time Search: Instant product search with similarity scoring
- Interactive UI: Click-to-explore product recommendations
- Category Filtering: Browse products by category with pagination
- Mobile Responsive: Optimized for all device sizes
- Status Dashboard: Real-time system status monitoring
- Data Loading Progress: Visual feedback for dataset processing
- Performance Metrics: Similarity scores and search relevance
app.py # Main Flask API server
โโโ ProductRecommender # Core recommendation engine
โโโ REST API endpoints # /api/* routes
โโโ CORS enabled # Cross-origin resource sharing
data_processor.py # Data ingestion and preprocessing
โโโ Amazon data support
โโโ CSV file processing
โโโ Data standardization
src/
โโโ App.js # Main React component
โโโ App.css # Comprehensive styling
โโโ index.js # React DOM entry point
โโโ public/
โโโ index.html # HTML template
- Flask 2.3.3: Web framework
- Flask-CORS 4.0.0: Cross-origin resource sharing
- pandas 2.1.1: Data manipulation and analysis
- numpy 1.24.3: Numerical computing
- scikit-learn 1.3.0: Machine learning utilities
- sentence-transformers 2.7.0: Semantic embeddings
- faiss-cpu 1.7.4: Fast similarity search
- transformers 4.40.0: Hugging Face transformers
- torch 2.0.1: PyTorch deep learning framework
- React 18.2.0: UI framework
- React DOM 18.2.0: DOM rendering
- React Scripts 5.0.1: Build tools and development server
- Axios 1.5.0: HTTP client for API communication
- Python 3.8+
- Node.js 16+
- npm or yarn
-
Clone the repository
git clone <repository-url> cd smart-product-recommender
-
Set up Python environment
# Create virtual environment python -m venv venv # Activate virtual environment # Windows: venv\Scripts\activate # macOS/Linux: source venv/bin/activate # Install Python dependencies pip install -r requirements.txt
-
Set up React frontend
# Install Node.js dependencies npm install
-
Start the Backend Server
python app.py
The Flask server will start on
http://localhost:5000 -
Start the Frontend Development Server
npm start
The React app will start on
http://localhost:3000 -
Access the Application Open your browser and navigate to
http://localhost:3000
- Amazon Product Metadata: JSON Lines format (
.json,.jsonl,.gz) - Amazon Reviews: JSON Lines format (
.json,.jsonl,.gz) - CSV Files: Standard comma-separated values with flexible column mapping
The project includes sample Amazon datasets in the data/ directory:
- Electronics product metadata (
meta_Electronics.json.gz) - Electronics reviews (
reviews_Electronics_5.json.gz) - Various category-specific review datasets
- Amazon Product Data - Official Amazon dataset repository
- Custom CSV files with product information
POST /api/load_data- Load and process datasetsGET /api/status- System status and health check
GET /api/products- Get products with pagination and filteringGET /api/product/<id>- Get single product detailsGET /api/categories- Get all available categories
GET /api/recommendations/similar/<id>- Get similar productsGET /api/recommendations/complementary/<id>- Get complementary productsGET /api/search?q=<query>- Search products by text query
- Search Bar: Semantic search with real-time results
- Product Grid: Responsive card-based product display
- Category Filter: Dropdown for category-based filtering
- Pagination: Navigate through large product catalogs
- Similar Products: Vector similarity-based recommendations with scores
- Complementary Products: Business logic-based cross-selling suggestions
- Interactive Navigation: Click products to explore recommendations
- Data Loading: Visual indicators for dataset status
- Embeddings Status: Shows if AI models are ready
- Product Count: Real-time product statistics
- Data Ingestion: Load and standardize product data
- Text Processing: Create combined text features from product attributes
- Embedding Generation: Convert text to vector embeddings using SentenceTransformers
- Index Building: Create FAISS index for fast similarity search
- TF-IDF Matrix: Build traditional text similarity matrix as fallback
- Semantic Similarity: Cosine similarity in embedding space
- Category-Based Logic: Predefined complementary product mappings
- Hybrid Scoring: Combines multiple signals for ranking
- FAISS Indexing: Sub-linear time complexity for similarity search
- Normalized Embeddings: Optimized cosine similarity computation
- Batch Processing: Efficient embedding generation
- Lazy Loading: On-demand data processing
smart-product-recommender/
โโโ README.md # This file
โโโ requirements.txt # Python dependencies
โโโ package.json # Node.js dependencies
โโโ package-lock.json # Locked dependency versions
โ
โโโ app.py # Flask backend server
โโโ data_processor.py # Data processing utilities
โ
โโโ src/ # React frontend source
โ โโโ App.js # Main React component
โ โโโ App.css # Styling and responsive design
โ โโโ index.js # React DOM entry point
โ
โโโ public/ # Static files
โ โโโ index.html # HTML template
โ
โโโ data/ # Sample datasets
โ โโโ meta_Electronics.json.gz
โ โโโ reviews_Electronics_5.json.gz
โ โโโ [other category datasets]
โ
โโโ venv/ # Python virtual environment
โโโ node_modules/ # Node.js dependencies
- Start the Application: Follow the Quick Start guide
- Access the Web Interface: Navigate to
http://localhost:3000 - Load Dataset:
- Enter the path to your dataset file (e.g.,
./data/meta_Electronics.json.gz) - Click "Load Dataset"
- Wait for processing to complete
- Enter the path to your dataset file (e.g.,
- Browse Catalog: Use category filters and pagination
- Search Products: Use the search bar for semantic search
- Get Recommendations: Click any product to see similar and complementary items
- View Details: Each product card shows key information and similarity scores
{"asin": "B001", "title": "Product Name", "category": ["Electronics"], "brand": "Brand"}product_name,description,category,brand,price,rating
"Product Name","Description","Electronics","Brand",99.99,4.5Create a .env file for configuration:
FLASK_ENV=development
FLASK_DEBUG=True
API_BASE_URL=http://localhost:5000The system uses all-MiniLM-L6-v2 by default. To use a different model, modify the ProductRecommender class in app.py:
self.model = SentenceTransformer('your-preferred-model')Customize complementary product mappings in app.py:
complementary_map = {
'Electronics': ['Accessories', 'Cases', 'Cables'],
'Phones': ['Cases', 'Screen Protectors', 'Chargers'],
# Add your custom mappings
}# Test data processor
python data_processor.py
# Test API endpoints
curl http://localhost:5000/api/status# Run React tests
npm test
# Build for production
npm run build# Build React app
npm run build
# Set Flask to production
export FLASK_ENV=production
python app.pyCreate a Dockerfile:
FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
EXPOSE 5000
CMD ["python", "app.py"]-
Memory Issues with Large Datasets
- Reduce batch size in embedding generation
- Use smaller embedding models
- Process data in chunks
-
Slow Similarity Search
- Ensure FAISS index is built correctly
- Check embedding normalization
- Consider using GPU version of FAISS
-
Frontend API Connection Issues
- Verify Flask server is running on port 5000
- Check CORS configuration
- Ensure API_BASE_URL is correct
- Embedding Model: Use smaller models for faster processing
- FAISS Index: Use IVF indices for very large datasets
- Batch Size: Adjust embedding batch size based on available memory
- Caching: Implement Redis caching for frequent queries
- Response Times: Monitor API endpoint performance
- Memory Usage: Track embedding and index memory consumption
- Search Quality: Evaluate recommendation relevance
The system includes comprehensive logging:
import logging
logging.basicConfig(level=logging.INFO)- Fork the repository
- Create a feature branch
- Make your changes
- Add tests
- Submit a pull request
- Python: Follow PEP 8
- JavaScript: Use ESLint configuration
- CSS: Follow BEM methodology
This project is licensed under the MIT License - see the LICENSE file for details.
- Hugging Face: For SentenceTransformers library
- Facebook AI: For FAISS similarity search
- Amazon: For providing public datasets
- React Team: For the excellent frontend framework
For questions and support:
- Create an issue on GitHub
- Check the troubleshooting section
- Review the API documentation
- User Profiles: Personalized recommendations based on user history
- A/B Testing: Framework for testing recommendation algorithms
- Real-time Updates: Live data ingestion and index updates
- Advanced Analytics: Detailed recommendation performance metrics
- Multi-language Support: International product catalogs
- Mobile App: Native mobile application
- GraphQL API: Alternative to REST API
- Microservices: Split into smaller, focused services
Built with โค๏ธ using Python, React, and AI