Skip to content

Commit c6509c8

Browse files
authored
uploaded file with necessary changes to optimise
Removed bug in code which caused OSError and PermissionError and added error handling code incase the directory already exists to prevent exception by adding code snippet: import os os.makedirs('data_scrapped', exist_ok=True) df.to_csv('data_scrapped/data_rotten_tomatoes.csv', index=False) Also added additional exception handling blocks in case movie titles or reviews doesn't exist def getReviewText(review_url): '''Returns the user review text given the review soup.''' tag = review_url.find('p', attrs={'class': 'review-text'}) # Use select_one for efficient CSS selector if tag: return tag.get_text(strip=True) # Use strip=True to remove extra whitespace return None # Handle case where review text is not found def getMovieTitle(review_url): '''Returns the movie title from the review soup.''' tag = review_url.find('title') if tag: title_tag = list(tag.children)[0].get_text() movie_title = title_tag.split(' - Movie Reviews | Rotten Tomatoes')[0] return movie_title return None # Handle case where title is not found To use less memory use set instead of dict.fromkeys() to remove duplicates # remove duplicate links unique_movie_links = list(set(tag['href'] for tag in movie_tags)) To remove ModuleNotFoundError: No module named 'textblob' exception added pip install textblob
1 parent a76dd2a commit c6509c8

File tree

1 file changed

+718
-41
lines changed

1 file changed

+718
-41
lines changed

0 commit comments

Comments
 (0)