Skip to content

Commit ac7461c

Browse files
added flipkart scraper
1 parent f05226e commit ac7461c

File tree

3 files changed

+68
-0
lines changed

3 files changed

+68
-0
lines changed

FlipkartScraper/README.md

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
# Flipkart Scraper
2+
## This is a simple scraper designed to extract product information from Flipkart, an e-commerce platform. The scraper is written in Python and consists of the following files:
3+
4+
1. dbConnector.py: This file contains the code for connecting to a database and performing database operations related to storing the scraped data.
5+
6+
2. genericHtmlib.py: This file provides a set of generic functions and utilities for parsing HTML and extracting data from web pages.
7+
8+
3. main.py: This is the main entry point of the scraper. It initializes the necessary components and orchestrates the scraping process.
9+
10+
4. productList.py: container categories of list that you want to scrape.
11+
12+
5. pycache: This directory contains the compiled bytecode of the Python files for faster execution. You can safely ignore this directory.
13+
14+
6. useragent.py: This file defines the User-Agent string that the scraper uses for making HTTP requests. It helps mimic the behavior of a real web browser.
15+
16+
## To use the Flipkart scraper, follow these steps:
17+
18+
Make sure you have Python installed on your system.
19+
- create a virtual env by running the following command:
20+
```
21+
python3 -m venv venv
22+
```
23+
24+
Install the required dependencies by running the following command:
25+
```
26+
pip install -r requirements.txt
27+
```
28+
29+
- open productList.py and add the categories of list that you want to scrape.
30+
31+
Execute the scraper by running the following command:
32+
33+
```
34+
python main.py
35+
```
36+
37+
The scraper will start processing the product URLs one by one, extracting relevant information such as the product name, price, description, and any other details specified in the code. The scraped data will be stored in the configured database or output format.
38+
39+
Please note that web scraping should be done responsibly and in compliance with the terms and conditions of the target website. Make sure to respect the website's policies regarding scraping frequency and data usage.
40+
41+
If you encounter any issues or have any questions, feel free to open an issue or reach out to the project maintainer.
42+
43+
Built with ❤️ by [Paritosh Tripathi](https://github.com/paritoshtripathi935)

FlipkartScraper/flipkart.db

16 MB
Binary file not shown.

FlipkartScraper/requirements.txt

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
jupyter
2+
scikit-learn
3+
pandas
4+
numpy
5+
matplotlib
6+
seaborn
7+
tensorflow
8+
flask
9+
openai
10+
bs4
11+
requests
12+
pandas
13+
requests
14+
numpy
15+
bs4
16+
geopy
17+
boto3
18+
ndjson
19+
selenium
20+
httpx
21+
lxml
22+
python-dotenv
23+
paramiko
24+
undetected-chromedriver
25+
fastjsonschema

0 commit comments

Comments
 (0)