The code base can be used to scrap any site a user wanted to find occurrences of any keyword or any phrase for finding the neutrality or popularity of any issues or events.
Steps needed:
- Download the repository including 3 files, .json file is not a sample so not needed.
- Execute the file first, testscrapy.py. This will create a .json file as scraped output of the site.
- Execute the second file, rmv_none_values.py. This will extract or filter out and give the count of the phrase or keyword which you have mentioned from the .json file.
The file testscrapy.py can be modified a little to get an automated crawler without boundaries and the file rmv_none_values.py can be modified to get optimized results as well.