This Project aims to facilitate gathering data on IMDb.
- Movie Title;
- Release Year;
- Age Rating;
- Genres;
- Duration;
- Rating;
- MetaScore;
- Customizable Output
- beautifulsoup4 – for parsing HTML
- lxml - the html parser
- requests – for making HTTP requests to retrieve webpage content.
- pandas – for storing and saving the data to a CSV or Excel file.
- openpyxl – required for saving to Excel format.
- Provide your urls: provide a list of movie urls to scrape.
- Run the script.
- Save the data: you can choose between csv and excel.
- Rate Limiting: The algorithm has a builtin cooldown to avoid sending too many requests in a short period of time ann getting your ip blocked by IMDb.
- Always Changing Website: Due to IMDb's constant updating of their website, the elements and classes may change, requiring your attention before executing the program.