Commit c6509c8

authored

uploaded file with necessary changes to optimise

Removed bug in code which caused OSError and PermissionError and added error handling code incase the directory already exists to prevent exception by adding code snippet: import os os.makedirs('data_scrapped', exist_ok=True) df.to_csv('data_scrapped/data_rotten_tomatoes.csv', index=False) Also added additional exception handling blocks in case movie titles or reviews doesn't exist def getReviewText(review_url): '''Returns the user review text given the review soup.''' tag = review_url.find('p', attrs={'class': 'review-text'}) # Use select_one for efficient CSS selector if tag: return tag.get_text(strip=True) # Use strip=True to remove extra whitespace return None # Handle case where review text is not found def getMovieTitle(review_url): '''Returns the movie title from the review soup.''' tag = review_url.find('title') if tag: title_tag = list(tag.children)[0].get_text() movie_title = title_tag.split(' - Movie Reviews | Rotten Tomatoes')[0] return movie_title return None # Handle case where title is not found To use less memory use set instead of dict.fromkeys() to remove duplicates # remove duplicate links unique_movie_links = list(set(tag['href'] for tag in movie_tags)) To remove ModuleNotFoundError: No module named 'textblob' exception added pip install textblob

1 parent a76dd2a commit c6509c8Copy full SHA for c6509c8

1 file changed

+718

-41

lines changed

Movie_review_rotten_tomatoes.ipynb

1 file changed

+718

-41

lines changed

Comments

(0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit c6509c8

1 file changed

1 file changed

File tree

1 file changed

1 file changed

0 commit comments