BookMiner is a data pipeline project that scrapes book data from web pages, stores the raw HTML, processes and combines the data, and performs exploratory data analysis (EDA) to derive insights.
-
Web Scraping
Scrapes book listings from online pages and stores the HTML files. -
Data Extraction & Storage
Parses and combines data from multiple HTML pages into a single structured CSV file. -
Exploratory Data Analysis (EDA)
Performs visual and statistical analysis to uncover patterns in book pricing, ratings, value scores, and more.
├── 1_scraping.ipynb # Scrapes book data and stores HTML files
├── 2_EDA.ipynb # Performs EDA on the combined CSV data
├── DATA.csv # Cleaned and structured dataset
├── README.md # Project overview and instructions
├── HTMLs # All scraped pages from website
- Price distribution of books
- Correlation between rating and value score
- Most common price ranges for high-rated books
- Python (BeautifulSoup, Requests, Pandas)
- Jupyter Notebook
- Matplotlib, Seaborn for visualization
- Clone the repo:
git clone https://github.com/your-username/BookMiner.git
cd BookMiner
- Run the notebooks in order:
1_scraping.ipynb
2_EDA.ipynb
This project is for educational and non-commercial use.
Made with ❤️ for data and books.