BookSage AI is a hybrid book recommendation system combining Collaborative Filtering (KNN-based) and Content-Based (TF-IDF + Cosine Similarity) models, with a weighted hybrid approach for personalized results. The project ingests and preprocesses large-scale book datasets, applies active-user and popular-book filtering, and dynamically generates recommendations enriched with metadata (title, author, publisher, year, and cover image). I engineered a modular architecture with clear data pipelines, model persistence, and reusable functions, ensuring scalability and maintainability. The system was deployed as a Dockerized Flask web application with a responsive HTML/CSS front-end, integrated into a CI/CD pipeline for automated builds and deployments. This design demonstrates proficiency in ML model building, orchestration, backend API development, front-end integration, containerization, and production-grade deployment workflows.
demo.mp4
Try the Hybrid Book Recommendation System live: https://booksage-ai.onrender.com/
| Category | Technology / Resource |
|---|---|
| Core Language | Python 3.11 |
| Backend Framework | FastAPI |
| Data Processing | Pandas (Data Cleaning & Merging), NumPy (Matrix Ops) |
| Recommendation Models | Hybrid System: Collaborative Filtering + Content-Based Filtering |
| Collaborative Filtering | SciPy (csr_matrix), scikit-learn (NearestNeighbors) |
| Content-Based Filtering | scikit-learn (TfidfVectorizer, cosine_similarity) |
| Hybrid Fusion Logic | Weighted average score combination |
| Data Sources | Book-Crossing Dataset (BX-Books, BX-Users, BX-Ratings) |
| Feature Engineering | TF-IDF on combined features (title, author, publisher, year) |
| Model Persistence | Pickle (Model & Processed Data Serialization) |
| Memory System | In-memory caching of processed data for faster responses |
| Evaluation Metrics | Popularity-based filtering, Active user filtering |
| Orchestration Layer | Modular service classes (DataLoader, DataPreprocessor, ModelManager, HybridModel) |
| Frontend | HTML5, Jinja2 Templates, Static CSS & JavaScript |
| Deployment | Docker (Python 3.11-slim base), requirements.txt dependency locking |
| Portability | Pathlib-based cross-platform directory resolution |
| Error Handling | Graceful fallbacks & empty results handling |
| Feature | BookSage AI | Typical Recommenders |
|---|---|---|
| Method Flexibility | 3 modes + hybrid tuning | Usually single-method |
| Cold Start Handling | Popular books fallback | Often fails |
| Explainability | Shows scores + metadata | Black-box results |
| UI Customization | Adjustable weights/counts | Fixed parameters |
BookSage-AI/
├── .github/
│ └── workflows/
│ └── main.yml
├── app/
│ ├── core/
│ │ ├── config.py
│ │ ├── logger.py
│ │ ├── models.py
│ │ └── __init__.py
│ ├── services/
│ │ ├── collaborative_model.py
│ │ ├── content_model.py
│ │ ├── data_loader.py
│ │ ├── data_preprocessor.py
│ │ ├── hybrid_model.py
│ │ ├── model_manager.py
│ │ ├── recommendation_engine.py
│ │ └── __init__.py
│ ├── main.py
│ └── __init__.py
├── data/
│ ├── BX-Book-Ratings.csv
│ ├── BX-Books.csv
│ └── BX-Users.csv
├── models/
│ ├── book_pivot.pkl
│ ├── books_content.pkl
│ ├── books_data.pkl
│ ├── cb_model.pkl
│ ├── cf_model.pkl
│ ├── content_sim_matrix.pkl
│ ├── final_rating.pkl
│ ├── tfidf_vectorizer.pkl
│ └── title_to_idx.pkl
├── notebooks/
│ └── experiment.ipynb
├── static/
│ ├── css/
│ │ ├── style.css
│ │ └── style_recommendor.css
│ └── js/
│ └── script.js
├── templates/
│ ├── index.html
│ └── recommendations.html
├── tests/
│ ├── conftest.py
│ ├── test_collaborative_model.py
│ ├── test_config.py
│ ├── test_content_model.py
│ ├── test_data_loader.py
│ ├── test_data_preprocessor.py
│ ├── test_endpoints.py
│ ├── test_hybrid_model.py
│ ├── test_logger.py
│ ├── test_models.py
│ ├── test_model_manager.py
│ ├── test_recommendation_engine.py
│ └── __init__.py
├── .flake8
├── .gitignore
├── app.png
├── demo.mp4
├── docker-compose.yml
├── Dockerfile
├── LICENSE
├── pyproject.toml
├── README.md
├── render.yml
├── requirements.txt
├── run.py
├── setup.py
└── train_models.py
graph TD
A[Raw Data: BX-Books, BX-Users, BX-Ratings] --> B[Data Preprocessing & Feature Engineering]
B --> C[Collaborative Filtering Model]
B --> D[Content-Based Model]
C --> E[User-Item Matrix - csr_matrix]
D --> F[TF-IDF Features - Title+Author+Publisher+Year]
E --> G[Hybrid Recommender - Weighted Score Fusion]
F --> G
G --> H[Flask API - Endpoints for Recommendations]
H --> I[Frontend: HTML + Jinja2 + CSS/JS]
subgraph Deployment
J[Docker Container]
J --> H
J --> I
end
G --> K[Model Persistence - Pickle Serialization]
- Python 3.10+
- pip
# Clone the repository
git clone https://github.com/Md-Emon-Hasan/BookSage-AI.git
cd BookSage-AI
# Create virtual environment
python -m venv venv
venv\Scripts\activate # Windows
# source venv/bin/activate # Linux/Mac
# Install dependencies
pip install -r requirements.txt# Simple start (recommended)
python run.py
# Or with custom options
python run.py --host 0.0.0.0 --port 8000
# Train models first, then start server
python run.py --train
# Production mode (no auto-reload)
python run.py --prod
# Alternative: using uvicorn directly
uvicorn app.main:app --reload
# Open http://localhost:8000 in your browser| Method | Endpoint | Description |
|---|---|---|
| GET | / |
Home page with popular books |
| POST | /recommend |
Get book recommendations |
| GET | /search_books |
Search books by title |
| GET | /api/health |
Health check |
# Build and run
docker-compose up -d
# View logs
docker-compose logs -f
# Stop
docker-compose down# Run all tests
pytest tests/ -v
# Run with coverage
pytest tests/ -v --cov=app --cov-report=term-missingPrepared by:
Md Emon Hasan
Email: emon.mlengineer@gmail.com
WhatsApp: +8801834363533
GitHub: Md-Emon-Hasan
LinkedIn: Md Emon Hasan
Facebook: Md Emon Hasan
