Funder scrapers

A collection of scrapers for gathering data from grant funders, intended to be used in the Beehive funding platform.

Written using python3 and scrapy

Install

Clone into new directory git clone https://github.com/TechforgoodCAST/beehive-scrapers.git
Setup virtual environment python3 venv env
Enter virtual environment source env\bin\activate (linux) or env\Scripts\activate (windows)
Install requirements pip install -r requirements.txt
(Windows only) install pypiwin32: pip install pypiwin32

Run the command:

scrapy genspider -t fund_spider fundname "fundurl.com/path-to-fund-list"

Where:

fundname is the name of the funder (all lowercase, no spaces or special characters)
"fundurl.com/path-to-fund-list" should be the URL of the fund list page.

This will generate a skeleton scraper with the capability to:

You'll need to adjust the css selectors depending on the exact structure of the list page.

To output funds found to a funds.jl JSON lines file run: scrapy crawl comicrelief -o funds.jl

To run all spiders use the following command:

python funderscrapers/crawl_all.py

You can also use crawl_all.bat in Windows or ./crawl_all.sh in Bash.