|
| 1 | + WEB SCRAPING |
| 2 | + |
| 3 | +We are doing web scraping of filpkart with Python, which will let us analyse the data from a specific website and store it in many formats such as CSV, txt, excell, and so on. |
| 4 | +this data can use for various reasons like for sentiment analyse and want to know specific review from multiple user. |
| 5 | + |
| 6 | +-<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< STEPS <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< |
| 7 | + |
| 8 | +STEP 1; |
| 9 | +We are request to "flipkart" for scraping the data. |
| 10 | +requests.get :- function is use becoz to know the status code. request to flipkart from fetching the data in form of html |
| 11 | +response 200:- we can succesfully get the data in form web. |
| 12 | + |
| 13 | +STEP 2: |
| 14 | +(i)=know how to deal with multiple pages : |
| 15 | +(ii)=format use-LXML = allows for easy handling of XML and HTML files, and can also be used for web scraping. |
| 16 | +(iii)=get the html of web in you vs or local so that u can work on it. |
| 17 | +(iv)=as their are many pages realted to SINGLE so now fetch data form multiple pages |
| 18 | + -try to find ancare tag <a> in the html of page |
| 19 | + -not for 2,3 just for NEXT page. |
| 20 | + -we have to find a tag of particular tag and for link href and print that. |
| 21 | + -in href there is link without the 'https' so to get we just add |
| 22 | + cnp="https://www.flipkart.com"+np |
| 23 | + |
| 24 | +(v)=so for web scrap we have to fetch the link of all pages its time taking process so we create a loop for this procces which fetch all link for us. |
| 25 | + now we will use for loop to fetch data |
| 26 | + for i in range(1(start),10(end)) |
| 27 | + to move multiple pages we have to use in last of link + [srt(i)] |
| 28 | + |
| 29 | +(vi)=Decide want data want to scrap like:- |
| 30 | + -product name ,prize, reveiws ,description. |
| 31 | + -create list for every indivdual info. |
| 32 | + -Product_name=[] |
| 33 | + -Prices=[] |
| 34 | + -Description=[] |
| 35 | + -Reviews=[] |
| 36 | + |
| 37 | +(vii)=now create a function for each info what u want to fetch and store that data into realted list. |
| 38 | + revi=soup.find_all("div",class_="_3LWZlK") |
| 39 | + for i in revi: |
| 40 | + name=i.text |
| 41 | + Reviews.append(name) |
| 42 | + print(Reviews) |
| 43 | + similarly do for all the list |
| 44 | +(viii)=point to remember that we are scraping data form parcticluar box or area so we have to specify that area making variable BOX. |
| 45 | +(xi)=now create the datafarme with the help of pandas pf.DATAFRAME({"key":value}) store int the form of key and value. |
| 46 | + no remember that we are scraping the data for multiple pages so DON'T FORGET TO RE APPLY THE FOR LOOP AND THE str(i) for multiple pages. |
| 47 | + |
| 48 | +(xii)=last step to convet that data frame into csv file |
| 49 | + |
| 50 | +STEP 3 |
| 51 | +df.to_csv("filpkart-scraping-under-50k.csv") |
0 commit comments