-
Notifications
You must be signed in to change notification settings - Fork 130
Commit c6509c8
authored
uploaded file with necessary changes to optimise
Removed bug in code which caused OSError and PermissionError and added error handling code incase the directory already exists to prevent exception by adding code snippet:
import os
os.makedirs('data_scrapped', exist_ok=True)
df.to_csv('data_scrapped/data_rotten_tomatoes.csv', index=False)
Also added additional exception handling blocks in case movie titles or reviews doesn't exist
def getReviewText(review_url):
'''Returns the user review text given the review soup.'''
tag = review_url.find('p', attrs={'class': 'review-text'}) # Use select_one for efficient CSS selector
if tag:
return tag.get_text(strip=True) # Use strip=True to remove extra whitespace
return None # Handle case where review text is not found
def getMovieTitle(review_url):
'''Returns the movie title from the review soup.'''
tag = review_url.find('title')
if tag:
title_tag = list(tag.children)[0].get_text()
movie_title = title_tag.split(' - Movie Reviews | Rotten Tomatoes')[0]
return movie_title
return None # Handle case where title is not found
To use less memory use set instead of dict.fromkeys() to remove duplicates
# remove duplicate links
unique_movie_links = list(set(tag['href'] for tag in movie_tags))
To remove ModuleNotFoundError: No module named 'textblob' exception added pip install textblob1 parent a76dd2a commit c6509c8Copy full SHA for c6509c8
File tree
Expand file treeCollapse file tree
1 file changed
+718
-41
lines changedFilter options
Expand file treeCollapse file tree
1 file changed
+718
-41
lines changed
0 commit comments