Skip to content

Commit c5d49d0

Browse files
Merge pull request #1631 from Sushilverma002/master
ISSUE [#1551] WEBSCARPING OF FILPKART MOBILE PHONE UNDER 50K
2 parents 285cd41 + 617283b commit c5d49d0

File tree

3 files changed

+1088
-0
lines changed

3 files changed

+1088
-0
lines changed

Flipkart_webscraping/Scrap.py

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
import pandas as pd
2+
import requests
3+
from bs4 import BeautifulSoup
4+
5+
Product_name=[]
6+
Prices=[]
7+
Description=[]
8+
Reviews=[]
9+
10+
for i in range(2,43):
11+
#url="https://www.flipkart.com"
12+
url="https://www.flipkart.com/search?q=MOBILE+PHONE+UNDER+50000&otracker=search&otracker1=search&marketplace=FLIPKART&as-show=on&as=off&page="+str(2)
13+
14+
r=requests.get(url)
15+
soup=BeautifulSoup(r.text,"lxml")
16+
17+
box=soup.find("div",class_="_1YokD2 _3Mn1Gg")
18+
names=box.find_all("div",class_="_4rR01T")
19+
20+
#scraping data 1.product name
21+
for i in names:
22+
name=i.text
23+
Product_name.append(name)
24+
25+
#2.prices
26+
prices=box.find_all("div",class_="_30jeq3 _1_WHN1")
27+
for i in prices:
28+
name=i.text
29+
Prices.append(name)
30+
31+
#3.description
32+
desc=box.find_all("ul",class_="_1xgFaf")
33+
for i in desc:
34+
name=i.text
35+
Description.append(name)
36+
37+
#4.reviews
38+
revi=box.find_all("div",class_="_3LWZlK")
39+
for i in revi:
40+
name=i.text
41+
Reviews.append(name)
42+
43+
#data frame
44+
df=pd.DataFrame({"Product Name":Product_name,"Prices":Prices,"Description":Description,"Reviews":Reviews})
45+
#print(df)
46+
47+
#DF TO CSV
48+
df.to_csv("filpkart-Scraping-under-50k.csv")
49+
50+
51+

Flipkart_webscraping/Steps.txt

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
WEB SCRAPING
2+
3+
We are doing web scraping of filpkart with Python, which will let us analyse the data from a specific website and store it in many formats such as CSV, txt, excell, and so on.
4+
this data can use for various reasons like for sentiment analyse and want to know specific review from multiple user.
5+
6+
-<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< STEPS <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
7+
8+
STEP 1;
9+
We are request to "flipkart" for scraping the data.
10+
requests.get :- function is use becoz to know the status code. request to flipkart from fetching the data in form of html
11+
response 200:- we can succesfully get the data in form web.
12+
13+
STEP 2:
14+
(i)=know how to deal with multiple pages :
15+
(ii)=format use-LXML = allows for easy handling of XML and HTML files, and can also be used for web scraping.
16+
(iii)=get the html of web in you vs or local so that u can work on it.
17+
(iv)=as their are many pages realted to SINGLE so now fetch data form multiple pages
18+
-try to find ancare tag <a> in the html of page
19+
-not for 2,3 just for NEXT page.
20+
-we have to find a tag of particular tag and for link href and print that.
21+
-in href there is link without the 'https' so to get we just add
22+
cnp="https://www.flipkart.com"+np
23+
24+
(v)=so for web scrap we have to fetch the link of all pages its time taking process so we create a loop for this procces which fetch all link for us.
25+
now we will use for loop to fetch data
26+
for i in range(1(start),10(end))
27+
to move multiple pages we have to use in last of link + [srt(i)]
28+
29+
(vi)=Decide want data want to scrap like:-
30+
-product name ,prize, reveiws ,description.
31+
-create list for every indivdual info.
32+
-Product_name=[]
33+
-Prices=[]
34+
-Description=[]
35+
-Reviews=[]
36+
37+
(vii)=now create a function for each info what u want to fetch and store that data into realted list.
38+
revi=soup.find_all("div",class_="_3LWZlK")
39+
for i in revi:
40+
name=i.text
41+
Reviews.append(name)
42+
print(Reviews)
43+
similarly do for all the list
44+
(viii)=point to remember that we are scraping data form parcticluar box or area so we have to specify that area making variable BOX.
45+
(xi)=now create the datafarme with the help of pandas pf.DATAFRAME({"key":value}) store int the form of key and value.
46+
no remember that we are scraping the data for multiple pages so DON'T FORGET TO RE APPLY THE FOR LOOP AND THE str(i) for multiple pages.
47+
48+
(xii)=last step to convet that data frame into csv file
49+
50+
STEP 3
51+
df.to_csv("filpkart-scraping-under-50k.csv")

0 commit comments

Comments
 (0)