Skip to content

Commit fb7a820

Browse files
authored
Merge pull request #151 from Shraman-jain/updated-Streamlit-app
Streamlit WebApp Updated and Added a new part of Content-Based Recommendation System
2 parents d666e80 + 1a7baf5 commit fb7a820

File tree

8 files changed

+15481
-13
lines changed

8 files changed

+15481
-13
lines changed

Web_app/Home_Page.py

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
import streamlit as st
2+
3+
st.set_page_config(
4+
page_title="Home Page",
5+
page_icon="👋",
6+
)
7+
8+
st.write("# Welcome to Movie Review Analysis and Recommendation System 👋")
9+
10+
st.sidebar.success("Select above part.")
11+
12+
st.markdown(
13+
"""
14+
### Introduction
15+
The IMDb Movie Review Analysis and Recommendation System is a comprehensive
16+
tool designed to analyze movie reviews and provide personalized movie recommendations.
17+
It leverages natural language processing (NLP) techniques and machine learning
18+
algorithms to deliver insightful analysis and effective recommendations based on user preferences.
19+
20+
### Features
21+
1. **Sentiment Analysis** : Analyzes the sentiment of movie reviews (positive, negative).
22+
2. **Personalized Recommendations** : Recommends movies based on content filtering.
23+
24+
**👈 Select the part from the sidebar**
25+
26+
"""
27+
)

Web_app/README.md

Lines changed: 51 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,18 @@
1-
<h1 align="center">IMDb Movie Review Analysis and Recommendation System</h1>
2-
<blockquote align="center">Analyzing movie reviews and providing recommendations using Python and Streamlit. 🎬💻</blockquote>
3-
<p align="center">For new data generation and <b>sentiment analysis</b>, we have written a Python script to fetch📊 data from IMDb, analyze sentiments, and provide movie recommendations, all converted into an interactive web app using Streamlit. 🌐📈</p>
1+
# IMDb Movie Review Analysis and Recommendation System :film_projector:
2+
Analyzing movie reviews and providing recommendations using Python and Streamlit. 🎬💻. We have created two part in this WebApp :sunglasses:!!! :
3+
1. We have created a movie review analysis part.
4+
2. We have created a movie recommendation part.
5+
6+
<p align="center">
7+
For new data generation for <b>sentiment analysis</b> and <b>recommendation system</b>, we have written different Python script to fetch📊 data from IMDb, analyze sentiments, and provide movie recommendations, all converted into an interactive web app using Streamlit. 🌐📈</p>
8+
49

510
## Features
611

712
- **Scraping Movie Reviews**: Collects user reviews from IMDb using BeautifulSoup.
8-
- **Customizable Scraper**: Target specific movies and the number of pages to scrape.
13+
- **Customizable Scraper**: Collects Movie Description from IMDb using Selenium.
914
- **Sentiment Analysis**: Uses Support Vector Machine (SVM) to classify reviews as positive or negative.
10-
- **Recommendations**: Recommends top movies based on positive reviews.
15+
- **Recommendations**: Recommends top movies based on content of previous movie watched by user .
1116
- **CSV Output**: Saves the scraped data into a CSV file for further analysis.
1217

1318
## Installation
@@ -19,27 +24,62 @@
1924
pip install requests
2025
pip install pandas
2126
pip install scikit-learn
22-
27+
pip install selenium
28+
```
2329
## Usage
2430

31+
### For Sentiment Analysis Part
32+
2533
1. **Run the scraping script** to collect movie reviews and save them into a CSV file. Open and execute the Jupyter notebook:
2634

2735
```bash
2836
jupyter notebook notebooks/movie_review_imdb_scrapping.ipynb
29-
37+
```
3038
2. **Navigate to the Web_app directory:**
3139
```bash
3240
cd Web_app
41+
```
42+
43+
### For Content-Based Movie Recommendation Part
44+
45+
1. **Run the scraping script** to collect movie desciption and save them into a CSV file. Open and execute the Python Script.
46+
47+
***Note: you have to download web chromedriver and add it's path in Scrapper.py where it's mentioned driver path.***
3348

34-
3. **Run the Streamlit app:**
3549
```bash
36-
streamlit run app.py
50+
python -u "Scrapper.py"
51+
```
52+
3. **Run the similarity_model generating script** to find out similarity we have made a model which we will use in our webapp. Open and execute the Jupyter notebook
53+
54+
***Note: you have to necessarily run this model.ipynb as this will download similarity.pkl which is the model we use in Streamlit Webapp***
55+
56+
```bash
57+
jupyter notebook notebooks/model.ipynb
58+
```
59+
60+
### For HomePage
61+
1. **Navigate to the Web_app directory:**
62+
```bash
63+
cd Web_app
64+
```
65+
66+
2. **Run the Streamlit app:**
67+
```bash
68+
streamlit run Home_Page.py
69+
````
70+
### Home Page
71+
72+
![Home_Page](https://github.com/Shraman-jain/Scrape-ML/assets/60072287/dbbafd78-e6c2-4469-b55f-d7e555f382ae "Home Page")
73+
74+
### Sentiment Analysis Part
3775

38-
4. **Upload a CSV file** containing the reviews when prompted by the app.
76+
![Movie_review](https://github.com/Shraman-jain/Scrape-ML/assets/60072287/dd449b6f-680c-4b00-bc45-6662bc82e48c "Sentiment Analysis")
3977

78+
### Content-Based Movie Recommendation Part
4079

4180

81+
![Recommendation](https://github.com/Shraman-jain/Scrape-ML/assets/60072287/90599178-3d63-4a4a-8879-68408b2cc235 "Content-Based Movie Recommendation Part")
4282

4383

4484

45-
85+

Web_app/Scarper.py

Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
from selenium.webdriver.common.by import By
2+
from selenium.webdriver.common.keys import Keys
3+
from selenium.webdriver.support.ui import WebDriverWait
4+
from selenium.webdriver.support import expected_conditions as EC
5+
import time
6+
import csv
7+
import re
8+
from bs4 import BeautifulSoup
9+
from selenium.webdriver.chrome.options import Options
10+
from selenium import webdriver
11+
12+
DRIVER_PATH = 'E:/chromedriver-win64/chromedriver'
13+
# Initialize the Chrome driver
14+
15+
16+
options = webdriver.ChromeOptions()
17+
options.add_argument('--no-sandbox')
18+
options.add_argument('--disable-dev-shm-usage')
19+
driver = webdriver.Chrome(options=options,executable_path=DRIVER_PATH)
20+
21+
# Navigate to the URL
22+
driver.get('https://www.imdb.com/search/title/?title_type=tv_series,feature,tv_movie,tv_episode,tv_miniseries,tv_special&release_date=2000-01-01,2024-12-31')
23+
24+
driver.set_script_timeout(10000)
25+
def load_more_results():
26+
try:
27+
load_more_button = WebDriverWait(driver, 10).until(
28+
EC.element_to_be_clickable((By.XPATH, '//button[contains(@class, "ipc-see-more__button")]'))
29+
)
30+
driver.execute_script("arguments[0].scrollIntoView(true);", load_more_button)
31+
driver.execute_script("arguments[0].click();", load_more_button)
32+
time.sleep(2)
33+
return True
34+
except Exception as e:
35+
print(f"Error: {e}")
36+
return False
37+
def save_to_csv(movies, filename='movies.csv'):
38+
keys = movies[0].keys()
39+
with open(filename, 'a', newline='', encoding='utf-8') as output_file:
40+
dict_writer = csv.DictWriter(output_file, fieldnames=keys)
41+
dict_writer.writeheader()
42+
dict_writer.writerows(movies)
43+
44+
45+
all_movies=[]
46+
cnt=0
47+
while(cnt<300):
48+
cnt+=1
49+
print(cnt)
50+
if not load_more_results():
51+
break
52+
53+
movie_elements = driver.find_element(By.XPATH, "/html/body/div[2]/main/div[2]/div[3]/section/section/div/section/section/div[2]/div/section/div[2]/div[2]/ul")
54+
print("movie_list")
55+
56+
html_content = movie_elements.get_attribute('outerHTML')
57+
print("html movie_list")
58+
soup = BeautifulSoup(html_content, 'html.parser')
59+
60+
lst= soup.find_all("li", class_="ipc-metadata-list-summary-item")
61+
print("list")
62+
for i in lst:
63+
org_title= i.find("h3",class_="ipc-title__text").text
64+
try:
65+
title=re.sub(r'\d+\.\s*', '', org_title)
66+
except:
67+
title="NA"
68+
try:
69+
year = i.find("span", class_="sc-b189961a-8 kLaxqf dli-title-metadata-item").text
70+
71+
except:
72+
year="NA"
73+
try:
74+
rating = i.find("span", class_='ipc-rating-star ipc-rating-star--base ipc-rating-star--imdb ratingGroup--imdb-rating').text.split()[0]
75+
except:
76+
rating="NA"
77+
try:
78+
description = i.find("div", class_='ipc-html-content-inner-div').text
79+
except:
80+
description = "NA"
81+
all_movies.append({
82+
'title': title,
83+
'type':"Tv-Series",
84+
'year': year,
85+
'rating': rating,
86+
'description': description
87+
})
88+
89+
print("saving started")
90+
if all_movies:
91+
save_to_csv(all_movies)
92+
print("completed")
93+
driver.quit()

0 commit comments

Comments
 (0)