Skip to content

Commit 6ae131f

Browse files
Merge pull request #2567 from Juhibhojani/main
Zomato Scraper
2 parents 6432fdf + e4ea114 commit 6ae131f

File tree

3 files changed

+48
-0
lines changed

3 files changed

+48
-0
lines changed

Zomato Scraper/readme.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
# Infinite Scroll Web Scraping
2+
3+
This Python script uses Selenium and BeautifulSoup to perform web scraping with infinite scroll on the Zomato website. The script navigates to a specific page on Zomato that lists cafes in Ahmedabad, India, and extracts details such as name, link, rating, cuisine, and rate for each cafe.
4+
5+
## Requirements
6+
7+
- Google Chrome (or another browser) with a compatible version of ChromeDriver
8+
- ChromeDriver (compatible with your Chrome version)
9+
10+
## Working
11+
12+
The script will start scraping the cafes on the Zomato page. It will scroll down the page and print the details of each cafe until you interrupt the process (e.g., using Ctrl + C).

Zomato Scraper/requirements.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
beautifulsoup4==4.10.0
2+
selenium==3.141.0

Zomato Scraper/zomato.py

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
import re
2+
from selenium.webdriver.common.keys import Keys
3+
from bs4 import BeautifulSoup
4+
import time
5+
from selenium import webdriver
6+
7+
driver = webdriver.Chrome()
8+
url = "https://zomato.com/ahmedabad/restaurants/cafes?category=2"
9+
driver.get(url)
10+
html = driver.page_source
11+
soup = BeautifulSoup(html, "html.parser")
12+
13+
container = soup.find("div",{"id":"root"})
14+
i = 0
15+
16+
while True:
17+
i = 0
18+
for items in container.find_all("div",class_=re.compile("sc-1mo3ldo-0 sc-")):
19+
if i==0:
20+
i = 1
21+
continue
22+
print(items.text)
23+
first_child = items.find("div")
24+
for item in first_child:
25+
link = item.find("a",href=True)['href']
26+
print(link)
27+
name = item.find("h4")
28+
print(name.text)
29+
rating = item.find("div",{"class":"sc-1q7bklc-1 cILgox"})
30+
print(rating.text)
31+
cusine = item.find("p")
32+
print(cusine.text)
33+
rate = item.find("p").next_sibling
34+
print(rate.text)

0 commit comments

Comments
 (0)