Skip to content

kamatealif/shelf-sage

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

📚 Book Recommendation Engine

A full-stack Book Recommendation System built from scratch with Python.
i scrape book data, clean it, analyze it (EDA), and train a content-based recommender using TF-IDF and Cosine Similarity.
Later, i'll expose it through an API, build a frontend, and deploy it with CI/CD pipelines. 🚀

Python Data Science Machine Learning In Progress

✨ Features (So Far)

  • ✅ Web scraping from Books to Scrape (1000 books).
  • ✅ Cleaned & preprocessed dataset (books_clean.csv).
  • ✅ Exploratory Data Analysis (EDA): categories, price distribution, ratings, word clouds.
  • ✅ Content-Based Recommendation Model (TF-IDF + Cosine Similarity).
  • ✅ Robust BookRecommender class for reuse in scripts and notebooks.

📂 Project Structure

book-recommender/
├── data/
│   ├── raw/                # Scraped raw data
│   ├── processed/          # Cleaned data
│   └── books.csv           # Original scraped dataset
│
├── notebooks/              # Jupyter experiments
│   ├── 01_scraping_demo.ipynb
│   ├── 02_eda.ipynb
│   └── 03_recommendation_demo.ipynb
│
├── src/
│   ├── scraping/           # Web scraping code
│   │   └── scrape_books.py
│   ├── preprocessing/      # Data cleaning
│   │   └── clean_data.py
│   ├── models/             # ML models
│   │   ├── __init__.py
│   │   ├── content_based.py
│   │   └── recommender.py
│   └── utils/              # Helper utilities
│
├── tests/                  # Unit tests (coming soon)
├── api/                    # API (future step)
├── frontend/               # Frontend (future step)
├── requirements.txt
└── README.mdx

⚡ Quick Start

  1. Clone the repo
git clone https://github.com/kamatealif/shelf-sage.git
cd shelf-sage
  1. Create & activate a virtual environment(i am using uv)
# to install the uv if not installed
pip install uv

# to create the .venv with dependecines installed in it
uv sync

# activate it
.\.venv\Scripts\activate
  1. Scrape the dataset
python src/scraping/scrape_books.py
  1. Preprocess the data
python src/preprocessing/clean_data.py
  1. Run the recommender
python src/models/recommender.py
  1. Run the Fastapi server
uvicorn main:app --reload

🧠 How It Works

  • TF-IDF Vectorizer converts text (title + category + description) into numbers.

  • Cosine Similarity measures how close two books are in that space.

  • The recommender returns the most similar books for a given title.


🛠️ Tech Stack

  • Python (data scraping, processing, ML)

  • BeautifulSoup (scraping)

  • Pandas / NumPy (data wrangling)

  • Matplotlib / Seaborn / WordCloud (EDA)

  • scikit-learn (TF-IDF, cosine similarity)

  • FastAPI (planned, backend API)

  • svelte/Streamlit (planned, frontend UI)

  • Docker + GitHub Actions (planned, deployment & CI/CD)


👨‍💻 Author

kamatealif (@kamatealif)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors