GitHub - ZeinabMoayeri/web-crawler-python-selenium: website crawler with Python & Selenium

Introduction

This is a store site from which all the text information along with product photos are extracted.

At first, we wait until all the products are loaded on the page. Then we save that page in HTML format.
Then we run the Jupiter file to extract all page information + links of all products.
Then we go to the link of each product and extract the information of each product in addition to all its photos and save each one in a folder.

Installation

First, we install the required libraries with the following command:

pip install requests

pip install selenium

pip install pandas

The Version of these libraries, that I use, is in the requirements.txt file

1. Load the main page & save that

2. Extract all data of Products

run the Jupiter file that name is "DizaGallery_Scrap.ipynb" to extract names, links, designers, and prices.

3. Run extract_images.py

to extract all images for each product & save that own folder

4. Run extract_details.py

to extract all details for each product & save that on a CSV file | if I want to extract all text on the product page, run "extract_ipibox.py"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Installation

1. Load the main page & save that

2. Extract all data of Products

3. Run extract_images.py

4. Run extract_details.py

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
DizaGallery_Scrap.ipynb		DizaGallery_Scrap.ipynb
README.md		README.md
all_products_with_deatails.csv		all_products_with_deatails.csv
extract_details.py		extract_details.py
extract_images.py		extract_images.py
extract_ipibox.py		extract_ipibox.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Introduction

Installation

1. Load the main page & save that

2. Extract all data of Products

3. Run extract_images.py

4. Run extract_details.py

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages