A full-stack Book Recommendation System built from scratch with Python.
i scrape book data, clean it, analyze it (EDA), and train a content-based recommender using TF-IDF and Cosine Similarity.
Later, i'll expose it through an API, build a frontend, and deploy it with CI/CD pipelines. 🚀
- ✅ Web scraping from Books to Scrape (1000 books).
- ✅ Cleaned & preprocessed dataset (
books_clean.csv). - ✅ Exploratory Data Analysis (EDA): categories, price distribution, ratings, word clouds.
- ✅ Content-Based Recommendation Model (TF-IDF + Cosine Similarity).
- ✅ Robust
BookRecommenderclass for reuse in scripts and notebooks.
book-recommender/
├── data/
│ ├── raw/ # Scraped raw data
│ ├── processed/ # Cleaned data
│ └── books.csv # Original scraped dataset
│
├── notebooks/ # Jupyter experiments
│ ├── 01_scraping_demo.ipynb
│ ├── 02_eda.ipynb
│ └── 03_recommendation_demo.ipynb
│
├── src/
│ ├── scraping/ # Web scraping code
│ │ └── scrape_books.py
│ ├── preprocessing/ # Data cleaning
│ │ └── clean_data.py
│ ├── models/ # ML models
│ │ ├── __init__.py
│ │ ├── content_based.py
│ │ └── recommender.py
│ └── utils/ # Helper utilities
│
├── tests/ # Unit tests (coming soon)
├── api/ # API (future step)
├── frontend/ # Frontend (future step)
├── requirements.txt
└── README.mdx
- Clone the repo
git clone https://github.com/kamatealif/shelf-sage.git
cd shelf-sage- Create & activate a virtual environment(i am using uv)
# to install the uv if not installed
pip install uv
# to create the .venv with dependecines installed in it
uv sync
# activate it
.\.venv\Scripts\activate- Scrape the dataset
python src/scraping/scrape_books.py- Preprocess the data
python src/preprocessing/clean_data.py- Run the recommender
python src/models/recommender.py- Run the Fastapi server
uvicorn main:app --reload
🧠 How It Works
-
TF-IDF Vectorizer converts text (title + category + description) into numbers.
-
Cosine Similarity measures how close two books are in that space.
-
The recommender returns the most similar books for a given title.
🛠️ Tech Stack
-
Python (data scraping, processing, ML)
-
BeautifulSoup (scraping)
-
Pandas / NumPy (data wrangling)
-
Matplotlib / Seaborn / WordCloud (EDA)
-
scikit-learn (TF-IDF, cosine similarity)
-
FastAPI (planned, backend API)
-
svelte/Streamlit (planned, frontend UI)
-
Docker + GitHub Actions (planned, deployment & CI/CD)
👨💻 Author
kamatealif (@kamatealif)